Auto-update blog content from Obsidian: 2025-11-04 20:52:43

2025-11-04 20:52:43 +00:00
parent ef84e229b2
commit 37025b683a
3 changed files with 457 additions and 32 deletions
--- a/content/post/14-proxmox-cluster-upgrade-8-to-9-with-ceph.md
+++ b/content/post/14-proxmox-cluster-upgrade-8-to-9-with-ceph.md
@@ -1,7 +1,7 @@
 ---
 slug: proxmox-cluster-upgrade-8-to-9-with-ceph
-title: Template
-description:
+title: Upgrading my 3-node Proxmox VE HA Cluster from 8 to 9 with Ceph
+description: Step-by-step upgrade of my 3-node Proxmox VE highly available cluster from 8 to 9, based on Ceph distributed storage, without any downtime.
 date: 2025-11-04
 draft: true
 tags:
@@ -34,42 +34,41 @@ Before jumping into the upgrade, let's review the prerequisites:

 1. All nodes upgraded to the latest Proxmox VE `8.4`.
 2. Ceph cluster upgraded to  Squid (`19.2`).
-3. Proxmox Backup Server upgraded to Proxmox BS 4.
+3. Proxmox Backup Server upgraded to version 4.
 4. Reliable access to the node.
-5. A healthy cluster.
-6. backup of all VMs and CTs.
-7. At least 5 GB free disk space on the root mount point.
+5. Healthy cluster.
+6. Backup of all VMs and CTs.
+7. At least 5 GB free on `/`.

-Well, I have some homework to do before the major upgrade to Proxmox VE 9. My nodes are currently in version `8.3.2`, hence a first update is necessary.
+Notes about my environment:

-Then my Ceph cluster, for the distributed storage, is using Ceph Reef (`18.2.4`). After the update to Proxmox VE 8.4, I'll move from Ceph Reef to Squid.
-
-I don't use Proxmox Backup Server in my homelab for now, I can skip that point. For the access to the nodes, it would be better if I could reach the console (not from the Proxmox WebGUI). I don't have direct access, I only have SSH. If a node fails, I'd have to take it off the rack.
-
-The last points are checked, all my nodes have more than 10GB on the `/` mount point.
-
-ℹ️ One of my VM is using the host's processing unit of the APU via PCI passthrough. As this prevents the VM from hot migration, I remove the device at the beginning of this procedure to avoid having to restart the VM each time.
-
-Also, until the end of the upgrade to Proxmox VE 9, I set the Ceph OSDs as "no out", to avoid the CRUSH algorithm to try to rebalance the Ceph cluster during the upgrade:
+- PVE nodes are on `8.3.2`, so a minor upgrade to 8.4 is required first.
+- Ceph is Reef (`18.2.4`) and will be upgraded to Squid after PVE 8.4.
+- I don’t use PBS in my homelab, so I can skip that step.
+- I have more than 10GB available on `/` on my nodes, this is fine.
+- I only have SSH console access, if a node becomes unresponsive I may need physical access.
+- One VM has a CPU passthrough (APU). Passthrough prevents live‑migration, so I remove that mapping prior to the upgrade.
+- Set Ceph OSDs to `noout` during the upgrade to avoid automatic rebalancing:  
 ```bash
 ceph osd set noout
 ```

 ### Update Proxmox VE to 8.4.14

-The plan is simple, for all nodes, one at a time, I will:
- Enable the maintenance mode
+The plan is simple, for all nodes, one at a time:
+
+1. Enable the maintenance mode
 ```bash
 ha-manager crm-command node-maintenance enable $(hostname)
 ```

- Update the node
+2. Update the node
 ```bash
 apt-get update
 apt-get dist-upgrade -y
 ```

-At the end of the update, I'm asked to remove a bootloader, which I execute:
+At the end of the update, I'm invited to remove a bootloader, which I execute:
 ```plaintext
 Removable bootloader found at '/boot/efi/EFI/BOOT/BOOTX64.efi', but GRUB packages not set up to update it!
 Run the following command:
@@ -79,12 +78,12 @@ echo 'grub-efi-amd64 grub2/force_efi_extra_removable boolean true' | debconf-set
 Then reinstall GRUB with 'apt install --reinstall grub-efi-amd64'
 ```

- Restart the machine
+3. Restart the machine
 ```bash
 reboot
 ```

- Disable the maintenance node
+4. Disable the maintenance node
 ```bash
 ha-manager crm-command node-maintenance disable $(hostname)
 ```
@@ -97,23 +96,23 @@ Between each node, I wait for the Ceph status to be clean, without warnings.

 I can now move on into the Ceph upgrade, the Proxmox documentation for that procedure is [here](https://pve.proxmox.com/wiki/Ceph_Reef_to_Squid).

-On all nodes, I update the source of the Ceph packages for Proxmox:
+Update Ceph package sources on every node:
 ```bash
 sed -i 's/reef/squid/' /etc/apt/sources.list.d/ceph.list
 ```

-I upgrade the Ceph packages:
+Upgrade the Ceph packages:
 ```
 apt update
 apt full-upgrade -y
 ```

-After the upgrade on the first node, the Ceph version now shows `19.2.3`, I can see my OSDs appear now outdated, the monitors need either an upgrade or a restart:
-![proxmox-ceph-version-upgrade.png](img/proxmox-ceph-version-upgrade.png)
+After the upgrade on the first node, the Ceph version now shows `19.2.3`, I can see my OSDs appear as outdated, the monitors need either an upgrade or a restart:
+![Ceph storage status in Proxmox after first node Ceph package udpate](img/proxmox-ceph-version-upgrade.png)

 I carry on and upgrade the packages on the 2 other nodes. 

-I have a monitor on each node, so I have to restart the monitor, one node at a time:
+I have a monitor on each node, so I have to restart each monitor, one node at a time:
 ```bash
 systemctl restart ceph-mon.target
 ```
@@ -127,15 +126,17 @@ Once all monitors are restarted, they report the latest version, with `ceph mon
 - Before: `min_mon_release 18 (reef)`
 - After: `min_mon_release 19 (squid)`

-Now I can restart the OSDs, still one node at a time. I have one OSD per node:
+Now I can restart the OSDs, still one node at a time. In my setup, I have one OSD per node:
 ```bash
 systemctl restart ceph-osd.target
 ```

-I monitor the Ceph status with the Proxmox WebGUI. After the restart, it is showing some fancy colors. I'm just waiting to be back to green, it takes less than a minute:
-![Pasted_image_20251102230907.png](img/Pasted_image_20251102230907.png)
+I monitor the Ceph status with the Proxmox WebGUI. After the restart, it is showing some fancy colors. I'm just waiting for the PGs to be back to green, it takes less than a minute:
+![Ceph storage status in Proxmox during the first OSD restart](img/proxmox-ceph-status-osd-restart.png)

-A warning now shows up: `HEALTH_WARN: all OSDs are running squid or later but require_osd_release < squid`. Now all my OSDs are running Squid, I can set the minimum version to it:
+A warning shows up: `HEALTH_WARN: all OSDs are running squid or later but require_osd_release < squid`
+
+Now all my OSDs are running Squid, I can set the minimum version to it:
 ```bash
 ceph osd require-osd-release squid
 ```
@@ -178,7 +179,7 @@ WARNINGS: 2
 FAILURES: 2
 ```

-Let's review what's wrong with the current configuration:
+Let's review the problems it found:

 ```
 FAIL: 1 custom role(s) use the to-be-dropped 'VM.Monitor' privilege and need to be adapted after the upgrade