Auto-update blog content from Obsidian: 2025-11-04 20:52:43
All checks were successful
Blog Deployment / Check-Rebuild (push) Successful in 5s
Blog Deployment / Build (push) Has been skipped
Blog Deployment / Deploy-Staging (push) Successful in 9s
Blog Deployment / Test-Staging (push) Successful in 2s
Blog Deployment / Merge (push) Successful in 6s
Blog Deployment / Deploy-Production (push) Successful in 10s
Blog Deployment / Test-Production (push) Successful in 2s
Blog Deployment / Clean (push) Has been skipped
Blog Deployment / Notify (push) Successful in 2s
All checks were successful
Blog Deployment / Check-Rebuild (push) Successful in 5s
Blog Deployment / Build (push) Has been skipped
Blog Deployment / Deploy-Staging (push) Successful in 9s
Blog Deployment / Test-Staging (push) Successful in 2s
Blog Deployment / Merge (push) Successful in 6s
Blog Deployment / Deploy-Production (push) Successful in 10s
Blog Deployment / Test-Production (push) Successful in 2s
Blog Deployment / Clean (push) Has been skipped
Blog Deployment / Notify (push) Successful in 2s
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
---
|
||||
slug: proxmox-cluster-upgrade-8-to-9-with-ceph
|
||||
title: Template
|
||||
description:
|
||||
title: Upgrading my 3-node Proxmox VE HA Cluster from 8 to 9 with Ceph
|
||||
description: Step-by-step upgrade of my 3-node Proxmox VE highly available cluster from 8 to 9, based on Ceph distributed storage, without any downtime.
|
||||
date: 2025-11-04
|
||||
draft: true
|
||||
tags:
|
||||
@@ -34,42 +34,41 @@ Before jumping into the upgrade, let's review the prerequisites:
|
||||
|
||||
1. All nodes upgraded to the latest Proxmox VE `8.4`.
|
||||
2. Ceph cluster upgraded to Squid (`19.2`).
|
||||
3. Proxmox Backup Server upgraded to Proxmox BS 4.
|
||||
3. Proxmox Backup Server upgraded to version 4.
|
||||
4. Reliable access to the node.
|
||||
5. A healthy cluster.
|
||||
6. backup of all VMs and CTs.
|
||||
7. At least 5 GB free disk space on the root mount point.
|
||||
5. Healthy cluster.
|
||||
6. Backup of all VMs and CTs.
|
||||
7. At least 5 GB free on `/`.
|
||||
|
||||
Well, I have some homework to do before the major upgrade to Proxmox VE 9. My nodes are currently in version `8.3.2`, hence a first update is necessary.
|
||||
Notes about my environment:
|
||||
|
||||
Then my Ceph cluster, for the distributed storage, is using Ceph Reef (`18.2.4`). After the update to Proxmox VE 8.4, I'll move from Ceph Reef to Squid.
|
||||
|
||||
I don't use Proxmox Backup Server in my homelab for now, I can skip that point. For the access to the nodes, it would be better if I could reach the console (not from the Proxmox WebGUI). I don't have direct access, I only have SSH. If a node fails, I'd have to take it off the rack.
|
||||
|
||||
The last points are checked, all my nodes have more than 10GB on the `/` mount point.
|
||||
|
||||
ℹ️ One of my VM is using the host's processing unit of the APU via PCI passthrough. As this prevents the VM from hot migration, I remove the device at the beginning of this procedure to avoid having to restart the VM each time.
|
||||
|
||||
Also, until the end of the upgrade to Proxmox VE 9, I set the Ceph OSDs as "no out", to avoid the CRUSH algorithm to try to rebalance the Ceph cluster during the upgrade:
|
||||
- PVE nodes are on `8.3.2`, so a minor upgrade to 8.4 is required first.
|
||||
- Ceph is Reef (`18.2.4`) and will be upgraded to Squid after PVE 8.4.
|
||||
- I don’t use PBS in my homelab, so I can skip that step.
|
||||
- I have more than 10GB available on `/` on my nodes, this is fine.
|
||||
- I only have SSH console access, if a node becomes unresponsive I may need physical access.
|
||||
- One VM has a CPU passthrough (APU). Passthrough prevents live‑migration, so I remove that mapping prior to the upgrade.
|
||||
- Set Ceph OSDs to `noout` during the upgrade to avoid automatic rebalancing:
|
||||
```bash
|
||||
ceph osd set noout
|
||||
```
|
||||
|
||||
### Update Proxmox VE to 8.4.14
|
||||
|
||||
The plan is simple, for all nodes, one at a time, I will:
|
||||
- Enable the maintenance mode
|
||||
The plan is simple, for all nodes, one at a time:
|
||||
|
||||
1. Enable the maintenance mode
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance enable $(hostname)
|
||||
```
|
||||
|
||||
- Update the node
|
||||
2. Update the node
|
||||
```bash
|
||||
apt-get update
|
||||
apt-get dist-upgrade -y
|
||||
```
|
||||
|
||||
At the end of the update, I'm asked to remove a bootloader, which I execute:
|
||||
At the end of the update, I'm invited to remove a bootloader, which I execute:
|
||||
```plaintext
|
||||
Removable bootloader found at '/boot/efi/EFI/BOOT/BOOTX64.efi', but GRUB packages not set up to update it!
|
||||
Run the following command:
|
||||
@@ -79,12 +78,12 @@ echo 'grub-efi-amd64 grub2/force_efi_extra_removable boolean true' | debconf-set
|
||||
Then reinstall GRUB with 'apt install --reinstall grub-efi-amd64'
|
||||
```
|
||||
|
||||
- Restart the machine
|
||||
3. Restart the machine
|
||||
```bash
|
||||
reboot
|
||||
```
|
||||
|
||||
- Disable the maintenance node
|
||||
4. Disable the maintenance node
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance disable $(hostname)
|
||||
```
|
||||
@@ -97,23 +96,23 @@ Between each node, I wait for the Ceph status to be clean, without warnings.
|
||||
|
||||
I can now move on into the Ceph upgrade, the Proxmox documentation for that procedure is [here](https://pve.proxmox.com/wiki/Ceph_Reef_to_Squid).
|
||||
|
||||
On all nodes, I update the source of the Ceph packages for Proxmox:
|
||||
Update Ceph package sources on every node:
|
||||
```bash
|
||||
sed -i 's/reef/squid/' /etc/apt/sources.list.d/ceph.list
|
||||
```
|
||||
|
||||
I upgrade the Ceph packages:
|
||||
Upgrade the Ceph packages:
|
||||
```
|
||||
apt update
|
||||
apt full-upgrade -y
|
||||
```
|
||||
|
||||
After the upgrade on the first node, the Ceph version now shows `19.2.3`, I can see my OSDs appear now outdated, the monitors need either an upgrade or a restart:
|
||||

|
||||
After the upgrade on the first node, the Ceph version now shows `19.2.3`, I can see my OSDs appear as outdated, the monitors need either an upgrade or a restart:
|
||||

|
||||
|
||||
I carry on and upgrade the packages on the 2 other nodes.
|
||||
|
||||
I have a monitor on each node, so I have to restart the monitor, one node at a time:
|
||||
I have a monitor on each node, so I have to restart each monitor, one node at a time:
|
||||
```bash
|
||||
systemctl restart ceph-mon.target
|
||||
```
|
||||
@@ -127,15 +126,17 @@ Once all monitors are restarted, they report the latest version, with `ceph mon
|
||||
- Before: `min_mon_release 18 (reef)`
|
||||
- After: `min_mon_release 19 (squid)`
|
||||
|
||||
Now I can restart the OSDs, still one node at a time. I have one OSD per node:
|
||||
Now I can restart the OSDs, still one node at a time. In my setup, I have one OSD per node:
|
||||
```bash
|
||||
systemctl restart ceph-osd.target
|
||||
```
|
||||
|
||||
I monitor the Ceph status with the Proxmox WebGUI. After the restart, it is showing some fancy colors. I'm just waiting to be back to green, it takes less than a minute:
|
||||

|
||||
I monitor the Ceph status with the Proxmox WebGUI. After the restart, it is showing some fancy colors. I'm just waiting for the PGs to be back to green, it takes less than a minute:
|
||||

|
||||
|
||||
A warning now shows up: `HEALTH_WARN: all OSDs are running squid or later but require_osd_release < squid`. Now all my OSDs are running Squid, I can set the minimum version to it:
|
||||
A warning shows up: `HEALTH_WARN: all OSDs are running squid or later but require_osd_release < squid`
|
||||
|
||||
Now all my OSDs are running Squid, I can set the minimum version to it:
|
||||
```bash
|
||||
ceph osd require-osd-release squid
|
||||
```
|
||||
@@ -178,7 +179,7 @@ WARNINGS: 2
|
||||
FAILURES: 2
|
||||
```
|
||||
|
||||
Let's review what's wrong with the current configuration:
|
||||
Let's review the problems it found:
|
||||
|
||||
```
|
||||
FAIL: 1 custom role(s) use the to-be-dropped 'VM.Monitor' privilege and need to be adapted after the upgrade
|
||||
|
||||
Reference in New Issue
Block a user