Auto-update blog content from Obsidian: 2026-04-29 19:59:52
Some checks failed
Blog Deployment / Check-Rebuild (push) Successful in 8s
Blog Deployment / Build (push) Successful in 31s
Blog Deployment / Deploy-Staging (push) Successful in 9s
Blog Deployment / Test-Staging (push) Failing after 3s
Blog Deployment / Merge (push) Has been skipped
Blog Deployment / Deploy-Production (push) Has been skipped
Blog Deployment / Test-Production (push) Has been skipped
Blog Deployment / Clean (push) Has been skipped
Blog Deployment / Notify (push) Successful in 3s
Some checks failed
Blog Deployment / Check-Rebuild (push) Successful in 8s
Blog Deployment / Build (push) Successful in 31s
Blog Deployment / Deploy-Staging (push) Successful in 9s
Blog Deployment / Test-Staging (push) Failing after 3s
Blog Deployment / Merge (push) Has been skipped
Blog Deployment / Deploy-Production (push) Has been skipped
Blog Deployment / Test-Production (push) Has been skipped
Blog Deployment / Clean (push) Has been skipped
Blog Deployment / Notify (push) Successful in 3s
This commit is contained in:
424
content/post/14-proxmox-cluster-upgrade-8-to-9-ceph/index.fr.md
Normal file
424
content/post/14-proxmox-cluster-upgrade-8-to-9-ceph/index.fr.md
Normal file
@@ -0,0 +1,424 @@
|
||||
---
|
||||
slug: proxmox-cluster-upgrade-8-to-9-ceph
|
||||
title: Mise à niveau de mon cluster Proxmox VE HA 3 nœuds de 8 vers 9 basé sur Ceph
|
||||
description: Mise à niveau pas à pas de mon cluster Proxmox VE 3 nœuds en haute disponibilité, de 8 vers 9, basé sur Ceph, sans aucune interruption.
|
||||
date: 2025-11-04
|
||||
draft: false
|
||||
tags:
|
||||
- proxmox
|
||||
- high-availability
|
||||
- ceph
|
||||
categories:
|
||||
- homelab
|
||||
---
|
||||
|
||||
## Intro
|
||||
|
||||
Mon **cluster Proxmox VE** a presque un an maintenant, et je n’ai pas tenu les nœuds complètement à jour. Il est temps de m’en occuper et de le passer en Proxmox VE **9**.
|
||||
|
||||
Je recherche principalement les nouvelles règles d’affinité HA, mais voici les changements utiles apportés par cette version :
|
||||
- Debian 13 "Trixie".
|
||||
- Snapshots pour le stockage LVM partagé thick-provisioned.
|
||||
- Fonctionnalité SDN fabrics.
|
||||
- Interface mobile améliorée.
|
||||
- Règles d’affinité dans le cluster HA.
|
||||
|
||||
Le cluster est composée de 3 nœuds, hautement disponible, avec une configuration hyper‑convergée, utilisant Ceph pour le stockage distribué.
|
||||
|
||||
Dans cet article, je décris les étapes de mise à niveau de mon cluster Proxmox VE, de la version 8 vers 9, tout en gardant les ressources actives. [Documentation officielle](https://pve.proxmox.com/wiki/Upgrade_from_8_to_9).
|
||||
|
||||
---
|
||||
## Prérequis
|
||||
|
||||
Avant de se lancer dans la mise à niveau, passons en revue les prérequis :
|
||||
|
||||
1. Tous les nœuds mis à jour vers la dernière version Proxmox VE `8.4`.
|
||||
2. Cluster Ceph mis à niveau vers Squid (`19.2`).
|
||||
3. Proxmox Backup Server mis à jour vers la version 4.
|
||||
4. Accès fiable au nœud.
|
||||
5. Cluster en bonne santé.
|
||||
6. Sauvegarde de toutes les VM et CT.
|
||||
7. Au moins 5 Go libres sur `/`.
|
||||
|
||||
Remarques sur mon environnement :
|
||||
|
||||
- Les nœuds PVE sont en `8.3.2`, donc une mise à jour mineure vers 8.4 est d’abord requise.
|
||||
- Ceph tourne sous Reef (`18.2.4`) et sera mis à niveau vers Squid après PVE 8.4.
|
||||
- Je n’utilise pas PBS dans mon homelab, donc je peux sauter cette étape.
|
||||
- J’ai plus de 10 Go disponibles sur `/` sur mes nœuds, c’est suffisant.
|
||||
- Je n’ai qu’un accès console SSH, si un nœud ne répond plus je pourrais avoir besoin d’un accès physique.
|
||||
- Une VM a un passthrough CPU (APU). Le passthrough empêche la migration à chaud, donc je supprime ce mapping avant la mise à niveau.
|
||||
- Mettre les OSD Ceph en `noout` pendant la mise à niveau pour éviter le rebalancing automatique :
|
||||
```bash
|
||||
ceph osd set noout
|
||||
```
|
||||
|
||||
### Mettre à Jour Proxmox VE vers 8.4.14
|
||||
|
||||
Le plan est simple, pour tous les nœuds, un par un :
|
||||
|
||||
1. Activer le mode maintenance
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance enable $(hostname)
|
||||
```
|
||||
|
||||
2. Mettre à jour le nœud
|
||||
```bash
|
||||
apt-get update
|
||||
apt-get dist-upgrade -y
|
||||
```
|
||||
|
||||
À la fin de la mise à jour, on me propose de retirer booloader, ce que j’exécute :
|
||||
```plaintext
|
||||
Removable bootloader found at '/boot/efi/EFI/BOOT/BOOTX64.efi', but GRUB packages not set up to update it!
|
||||
Run the following command:
|
||||
|
||||
echo 'grub-efi-amd64 grub2/force_efi_extra_removable boolean true' | debconf-set-selections -v -u
|
||||
|
||||
Then reinstall GRUB with 'apt install --reinstall grub-efi-amd64'
|
||||
```
|
||||
|
||||
3. Redémarrer la machine
|
||||
```bash
|
||||
reboot
|
||||
```
|
||||
|
||||
4. Désactiver le mode maintenance
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance disable $(hostname)
|
||||
```
|
||||
|
||||
Entre chaque nœud, j’attends que le statut Ceph soit clean, sans alertes.
|
||||
|
||||
✅ À la fin, le cluster Proxmox VE est mis à jour vers `8.4.14`
|
||||
|
||||
### Mettre à Niveau Ceph de Reef vers Squid
|
||||
|
||||
Je peux maintenant passer à la mise à niveau de Ceph, la documentation Proxmox pour cette procédure est [ici](https://pve.proxmox.com/wiki/Ceph_Reef_to_Squid).
|
||||
|
||||
Mettre à jour les sources de paquets Ceph sur chaque nœud :
|
||||
```bash
|
||||
sed -i 's/reef/squid/' /etc/apt/sources.list.d/ceph.list
|
||||
```
|
||||
|
||||
Mettre à niveau les paquets Ceph :
|
||||
```
|
||||
apt update
|
||||
apt full-upgrade -y
|
||||
```
|
||||
|
||||
Après la mise à niveau sur le premier nœud, la version Ceph affiche maintenant `19.2.3`, je peux voir mes OSD apparaître comme obsolètes, les moniteurs nécessitent soit une mise à niveau soit un redémarrage :
|
||||

|
||||
|
||||
Je poursuis et mets à niveau les paquets sur les 2 autres nœuds.
|
||||
|
||||
J’ai un moniteur sur chaque nœud, donc je dois redémarrer chaque moniteur, un nœud à la fois :
|
||||
```bash
|
||||
systemctl restart ceph-mon.target
|
||||
```
|
||||
|
||||
Je vérifie le statut Ceph entre chaque redémarrage :
|
||||
```bash
|
||||
ceph status
|
||||
```
|
||||
|
||||
Une fois tous les moniteurs redémarrés, ils rapportent la dernière version, avec `ceph mon dump` :
|
||||
- Avant : `min_mon_release 18 (reef)`
|
||||
- Après : `min_mon_release 19 (squid)`
|
||||
|
||||
Je peux maintenant redémarrer les OSD, toujours un nœud à la fois. Dans ma configuration, j’ai un OSD par nœud :
|
||||
```bash
|
||||
systemctl restart ceph-osd.target
|
||||
```
|
||||
|
||||
Je surveille le statut Ceph avec la WebGUI Proxmox. Après le redémarrage, elle affiche quelques couleurs fancy. J’attends juste que les PG redeviennent verts, cela prend moins d’une minute :
|
||||

|
||||
|
||||
Un avertissement apparaît : `HEALTH_WARN: all OSDs are running squid or later but require_osd_release < squid`
|
||||
|
||||
Maintenant tous mes OSD tournent sous Squid, je peux fixer la version minimum à celle‑ci :
|
||||
```bash
|
||||
ceph osd require-osd-release squid
|
||||
```
|
||||
|
||||
ℹ️ Je n’utilise pas actuellement CephFS donc je n’ai pas à me soucier du daemon MDS (MetaData Server).
|
||||
|
||||
✅ Le cluster Ceph a été mis à niveau avec succès vers Squid (`19.2.3`).
|
||||
|
||||
---
|
||||
## Vérifications
|
||||
|
||||
Les prérequis pour mettre à niveau le cluster vers Proxmox VE 9 sont maintenant complets. Suis‑je prêt à mettre à niveau ? Pas encore.
|
||||
|
||||
Un petit programme de checklist nommé **`pve8to9`** est inclus dans les derniers paquets Proxmox VE 8.4. Le programme fournit des indices et des alertes sur les problèmes potentiels avant, pendant et après la mise à niveau. Pratique non ?
|
||||
|
||||
Lancer l’outil la première fois me donne des indications sur ce que je dois faire. Le script vérifie un certain nombre de paramètres, regroupés par thème. Par exemple, voici la section sur les Virtual Guest :
|
||||
```plaintext
|
||||
= VIRTUAL GUEST CHECKS =
|
||||
|
||||
INFO: Checking for running guests..
|
||||
WARN: 1 running guest(s) detected - consider migrating or stopping them.
|
||||
INFO: Checking if LXCFS is running with FUSE3 library, if already upgraded..
|
||||
SKIP: not yet upgraded, no need to check the FUSE library version LXCFS uses
|
||||
INFO: Checking for VirtIO devices that would change their MTU...
|
||||
PASS: All guest config descriptions fit in the new limit of 8 KiB
|
||||
INFO: Checking container configs for deprecated lxc.cgroup entries
|
||||
PASS: No legacy 'lxc.cgroup' keys found.
|
||||
INFO: Checking VM configurations for outdated machine versions
|
||||
PASS: All VM machine versions are recent enough
|
||||
```
|
||||
|
||||
À la fin, vous avez le résumé. L’objectif est de corriger autant de `FAILURES` et `WARNINGS` que possible :
|
||||
```plaintext
|
||||
= SUMMARY =
|
||||
|
||||
TOTAL: 57
|
||||
PASSED: 43
|
||||
SKIPPED: 7
|
||||
WARNINGS: 2
|
||||
FAILURES: 2
|
||||
```
|
||||
|
||||
Passons en revue les problèmes qu’il a trouvés :
|
||||
|
||||
```
|
||||
FAIL: 1 custom role(s) use the to-be-dropped 'VM.Monitor' privilege and need to be adapted after the upgrade
|
||||
```
|
||||
|
||||
Il y a quelque temps, pour utiliser Terraform avec mon cluster Proxmox, j'ai créé un rôle dédié. C'était détaillé dans cet [article]({{< ref "post/3-terraform-create-vm-proxmox" >}}).
|
||||
|
||||
Ce rôle utilise le privilège `VM.Monitor`, qui a été supprimé dans Proxmox VE 9. De nouveaux privilèges, sous `VM.GuestAgent.*`, existent à la place. Je supprime donc celui-ci et j'ajouterai les nouveaux une fois le cluster mis à niveau.
|
||||
|
||||
```
|
||||
FAIL: systemd-boot meta-package installed. This will cause problems on upgrades of other boot-related packages. Remove 'systemd-boot' See https://pve.proxmox.com/wiki/Upgrade_from_8_to_9#sd-boot-warning for more information.
|
||||
```
|
||||
|
||||
Proxmox VE utilise généralement `systemd-boot` pour le démarrage uniquement dans certaines configurations gérées par proxmox-boot-tool. Le méta-paquet `systemd-boot` doit être supprimé. Ce paquet était automatiquement installé sur les systèmes de PVE 8.1 à 8.4, car il contenait `bootctl` dans Bookworm.
|
||||
|
||||
Si le script de la checklist pve8to9 le suggère, vous pouvez supprimer le méta-paquet `systemd-boot` sans risque, sauf si vous l'avez installé manuellement et que vous utilisez `systemd-boot` comme bootloader :
|
||||
```bash
|
||||
apt remove systemd-boot -y
|
||||
```
|
||||
|
||||
|
||||
```
|
||||
WARN: 1 running guest(s) detected - consider migrating or stopping them.
|
||||
```
|
||||
|
||||
Dans une configuration HA, avant de mettre à jour un nœud, je le mets en mode maintenance. Cela déplace automatiquement les ressources ailleurs. Quand ce mode est désactivé, la machine revient à son emplacement précédent.
|
||||
|
||||
```
|
||||
WARN: The matching CPU microcode package 'amd64-microcode' could not be found! Consider installing it to receive the latest security and bug fixes for your CPU.
|
||||
Ensure you enable the 'non-free-firmware' component in the apt sources and run:
|
||||
apt install amd64-microcode
|
||||
```
|
||||
|
||||
Il est recommandé d’installer le microcode processeur pour les mises à jour qui peuvent corriger des bogues matériels, améliorer les performances et renforcer la sécurité du processeur.
|
||||
|
||||
J’ajoute la source `non-free-firmware` aux sources actuelles :
|
||||
```bash
|
||||
sed -i '/^deb /{/non-free-firmware/!s/$/ non-free-firmware/}' /etc/apt/sources.list
|
||||
```
|
||||
|
||||
Puis installe le paquet `amd64-microcode` :
|
||||
```bash
|
||||
apt update
|
||||
apt install amd64-microcode -y
|
||||
```
|
||||
|
||||
Après ces petits ajustements, suis‑je prêt ? Vérifions en relançant le script `pve8to9`.
|
||||
|
||||
⚠️ N’oubliez pas de lancer `pve8to9` sur tous les nœuds pour vous assurer que tout est OK.
|
||||
|
||||
---
|
||||
## Mise à Niveau
|
||||
|
||||
🚀 Maintenant tout est prêt pour le grand saut ! Comme pour la mise à jour mineure, je procéderai nœud par nœud, en gardant mes VM et CT actives.
|
||||
|
||||
### Mettre le Mode Maintenance
|
||||
|
||||
D’abord, j’entre le nœud en mode maintenance. Cela déplacera la charge existante sur les autres nœuds :
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance enable $(hostname)
|
||||
```
|
||||
|
||||
Après avoir exécuté la commande, j’attends environ une minute pour laisser le temps aux ressources de migrer.
|
||||
|
||||
### Changer les Dépôts Sources vers Trixie
|
||||
|
||||
Depuis Debian Trixie, le format `deb822` est désormais disponible et recommandé pour les sources. Il est structuré autour d’un format clé/valeur. Cela offre une meilleure lisibilité et sécurité.
|
||||
|
||||
#### Sources Debian
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/debian.sources << EOF
|
||||
Types: deb deb-src
|
||||
URIs: http://deb.debian.org/debian/
|
||||
Suites: trixie trixie-updates
|
||||
Components: main contrib non-free-firmware
|
||||
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
|
||||
|
||||
Types: deb deb-src
|
||||
URIs: http://security.debian.org/debian-security/
|
||||
Suites: trixie-security
|
||||
Components: main contrib non-free-firmware
|
||||
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Sources Proxmox (sans subscription)
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/proxmox.sources << EOF
|
||||
Types: deb
|
||||
URIs: http://download.proxmox.com/debian/pve
|
||||
Suites: trixie
|
||||
Components: pve-no-subscription
|
||||
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Sources Ceph Squid (sans subscription)
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/ceph.sources << EOF
|
||||
Types: deb
|
||||
URIs: http://download.proxmox.com/debian/ceph-squid
|
||||
Suites: trixie
|
||||
Components: no-subscription
|
||||
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Supprimer les Anciennes Listes Bookworm
|
||||
|
||||
Les listes pour Debian Bookworm au format ancien doivent être supprimées :
|
||||
```bash
|
||||
rm -f /etc/apt/sources.list{,.d/*.list}
|
||||
```
|
||||
|
||||
### Mettre à Jour les Dépôts `apt` Configurés
|
||||
|
||||
Rafraîchir les dépôts :
|
||||
```bash
|
||||
apt update
|
||||
```
|
||||
```plaintext
|
||||
Get:1 http://security.debian.org/debian-security trixie-security InRelease [43.4 kB]
|
||||
Get:2 http://deb.debian.org/debian trixie InRelease [140 kB]
|
||||
Get:3 http://download.proxmox.com/debian/ceph-squid trixie InRelease [2,736 B]
|
||||
Get:4 http://download.proxmox.com/debian/pve trixie InRelease [2,771 B]
|
||||
Get:5 http://deb.debian.org/debian trixie-updates InRelease [47.3 kB]
|
||||
Get:6 http://security.debian.org/debian-security trixie-security/main Sources [91.1 kB]
|
||||
Get:7 http://security.debian.org/debian-security trixie-security/non-free-firmware Sources [696 B]
|
||||
Get:8 http://security.debian.org/debian-security trixie-security/main amd64 Packages [69.0 kB]
|
||||
Get:9 http://security.debian.org/debian-security trixie-security/main Translation-en [45.1 kB]
|
||||
Get:10 http://security.debian.org/debian-security trixie-security/non-free-firmware amd64 Packages [544 B]
|
||||
Get:11 http://security.debian.org/debian-security trixie-security/non-free-firmware Translation-en [352 B]
|
||||
Get:12 http://download.proxmox.com/debian/ceph-squid trixie/no-subscription amd64 Packages [33.2 kB]
|
||||
Get:13 http://deb.debian.org/debian trixie/main Sources [10.5 MB]
|
||||
Get:14 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages [241 kB]
|
||||
Get:15 http://deb.debian.org/debian trixie/non-free-firmware Sources [6,536 B]
|
||||
Get:16 http://deb.debian.org/debian trixie/contrib Sources [52.3 kB]
|
||||
Get:17 http://deb.debian.org/debian trixie/main amd64 Packages [9,669 kB]
|
||||
Get:18 http://deb.debian.org/debian trixie/main Translation-en [6,484 kB]
|
||||
Get:19 http://deb.debian.org/debian trixie/contrib amd64 Packages [53.8 kB]
|
||||
Get:20 http://deb.debian.org/debian trixie/contrib Translation-en [49.6 kB]
|
||||
Get:21 http://deb.debian.org/debian trixie/non-free-firmware amd64 Packages [6,868 B]
|
||||
Get:22 http://deb.debian.org/debian trixie/non-free-firmware Translation-en [4,704 B]
|
||||
Get:23 http://deb.debian.org/debian trixie-updates/main Sources [2,788 B]
|
||||
Get:24 http://deb.debian.org/debian trixie-updates/main amd64 Packages [5,412 B]
|
||||
Get:25 http://deb.debian.org/debian trixie-updates/main Translation-en [4,096 B]
|
||||
Fetched 27.6 MB in 3s (8,912 kB/s)
|
||||
Reading package lists... Done
|
||||
Building dependency tree... Done
|
||||
Reading state information... Done
|
||||
666 packages can be upgraded. Run 'apt list --upgradable' to see them.
|
||||
```
|
||||
|
||||
😈 666 paquets, je suis condamné !
|
||||
|
||||
### Mise à Niveau vers Debian Trixie et Proxmox VE 9
|
||||
|
||||
Lancer la mise à niveau :
|
||||
```bash
|
||||
apt-get dist-upgrade -y
|
||||
```
|
||||
|
||||
Pendant le processus, vous serez invité à approuver des changements de fichiers de configuration et certains redémarrages de services. Il se peut aussi que vous voyiez la sortie de certains changements, vous pouvez simplement en sortir en appuyant sur `q` :
|
||||
- `/etc/issue` : Proxmox VE régénérera automatiquement ce fichier au démarrage -> `No`
|
||||
- `/etc/lvm/lvm.conf` : Changements pertinents pour Proxmox VE seront mis à jour -> `Yes`
|
||||
- `/etc/ssh/sshd_config` : Selon votre configuration -> `Inspect`
|
||||
- `/etc/default/grub` : Seulement si vous l’avez modifié manuellement -> `Inspect`
|
||||
- `/etc/chrony/chrony.conf` : Si vous n’avez pas fait de modifications supplémentaires -> `Yes`
|
||||
|
||||
La mise à niveau a pris environ 5 minutes, selon le matériel.
|
||||
|
||||
À la fin de la mise à niveau, redémarrez la machine :
|
||||
```bash
|
||||
reboot
|
||||
```
|
||||
### Sortir du Mode Maintenance
|
||||
|
||||
Enfin, quand le nœud (espérons‑le) est revenu, vous pouvez désactiver le mode maintenance. La charge qui était localisée sur cette machine reviendra :
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance disable $(hostname)
|
||||
```
|
||||
|
||||
### Validation Après Mise à Niveau
|
||||
|
||||
- Vérifier la communication du cluster :
|
||||
```bash
|
||||
pvecm status
|
||||
```
|
||||
|
||||
- Vérifier les points de montage des stockages
|
||||
|
||||
- Vérifier la santé du cluster Ceph :
|
||||
```bash
|
||||
ceph status
|
||||
```
|
||||
|
||||
- Confirmer les opérations VM, les sauvegardes et les groupes HA
|
||||
|
||||
Les groupes HA ont été retirés au profit des règles d’affinité HA. Les groupes HA sont automatiquement migrés en règles HA.
|
||||
|
||||
- Désactiver le dépôt PVE Enterprise
|
||||
|
||||
Si vous n’utilisez pas le dépôt `pve-enterprise`, vous pouvez le désactiver : `` ```
|
||||
```bash
|
||||
sed -i 's/^/#/' /etc/apt/sources.list.d/pve-enterprise.sources
|
||||
```
|
||||
|
||||
🔁 Ce nœud est maintenant mis à niveau vers Proxmox VE 9. Procédez aux autres nœuds.
|
||||
|
||||
## Actions Postérieures
|
||||
|
||||
Une fois que tout le cluster a été mis à niveau, procédez aux actions postérieures :
|
||||
|
||||
- Supprimer le flag `noout` du cluster Ceph :
|
||||
```bash
|
||||
ceph osd unset noout
|
||||
```
|
||||
|
||||
- Recréer les mappings PCI passthrough
|
||||
|
||||
Pour la VM pour laquelle j’ai retiré le mapping hôte au début de la procédure, je peux maintenant recréer le mapping.
|
||||
|
||||
- Ajouter les privilèges pour le rôle Terraform
|
||||
|
||||
Pendant la phase de vérification, il m’a été conseillé de supprimer le privilège `VM.Monitor` de mon rôle personnalisé pour Terraform. Maintenant que de nouveaux privilèges ont été ajoutés avec Proxmox VE 9, je peux les attribuer à ce rôle :
|
||||
- VM.GuestAgent.Audit
|
||||
- VM.GuestAgent.FileRead
|
||||
- VM.GuestAgent.FileWrite
|
||||
- VM.GuestAgent.FileSystemMgmt
|
||||
- VM.GuestAgent.Unrestricted
|
||||
|
||||
## Conclusion
|
||||
|
||||
🎉 Mon cluster Proxmox VE est maintenant en version 9 !
|
||||
|
||||
Le processus de mise à niveau s’est déroulé assez tranquillement, sans aucune interruption pour mes ressources.
|
||||
|
||||
J’ai maintenant accès aux règles d’affinité HA, dont j’avais besoin pour mon cluster OPNsense.
|
||||
|
||||
Comme vous avez pu le constater, je ne maintiens pas mes nœuds à jour très souvent. Je pourrais automatiser cela la prochaine fois, pour les garder à jour sans effort.
|
||||
|
||||
|
||||
425
content/post/14-proxmox-cluster-upgrade-8-to-9-ceph/index.md
Normal file
425
content/post/14-proxmox-cluster-upgrade-8-to-9-ceph/index.md
Normal file
@@ -0,0 +1,425 @@
|
||||
---
|
||||
slug: proxmox-cluster-upgrade-8-to-9-ceph
|
||||
title: Upgrading my 3-node Proxmox VE HA Cluster from 8 to 9 based on Ceph
|
||||
description: Step-by-step upgrade of my 3-node Proxmox VE highly available cluster from 8 to 9, based on Ceph distributed storage, without any downtime.
|
||||
date: 2025-11-04
|
||||
draft: false
|
||||
tags:
|
||||
- proxmox
|
||||
- high-availability
|
||||
- ceph
|
||||
categories:
|
||||
- homelab
|
||||
---
|
||||
|
||||
## Intro
|
||||
|
||||
My **Proxmox VE** cluster is almost one year old now, and I haven’t kept the nodes fully up to date. Time to address this and bump it to Proxmox VE **9**.
|
||||
|
||||
I'm mainly after the new HA affinity rules, but here the useful changes that this version brings:
|
||||
- Debian 13 "Trixie".
|
||||
- Snapshots for thick-provisioned LVM shared storage.
|
||||
- SDN fabrics feature.
|
||||
- Improved mobile UI.
|
||||
- Affinity rules in HA cluster.
|
||||
|
||||
The cluster is a three‑node, highly available, hyper‑converged setup using Ceph for distributed storage.
|
||||
|
||||
In this article, I'll walk through the upgrade steps for my Proxmox VE cluster, from 8 to 9, while keeping the resources up and running. [Official docs](https://pve.proxmox.com/wiki/Upgrade_from_8_to_9).
|
||||
|
||||
---
|
||||
## Prerequisites
|
||||
|
||||
Before jumping into the upgrade, let's review the prerequisites:
|
||||
|
||||
1. All nodes upgraded to the latest Proxmox VE `8.4`.
|
||||
2. Ceph cluster upgraded to Squid (`19.2`).
|
||||
3. Proxmox Backup Server upgraded to version 4.
|
||||
4. Reliable access to the node.
|
||||
5. Healthy cluster.
|
||||
6. Backup of all VMs and CTs.
|
||||
7. At least 5 GB free on `/`.
|
||||
|
||||
Notes about my environment:
|
||||
|
||||
- PVE nodes are on `8.3.2`, so a minor upgrade to 8.4 is required first.
|
||||
- Ceph is Reef (`18.2.4`) and will be upgraded to Squid after PVE 8.4.
|
||||
- I don’t use PBS in my homelab, so I can skip that step.
|
||||
- I have more than 10GB available on `/` on my nodes, this is fine.
|
||||
- I only have SSH console access, if a node becomes unresponsive I may need physical access.
|
||||
- One VM has a CPU passthrough (APU). Passthrough prevents live‑migration, so I remove that mapping prior to the upgrade.
|
||||
- Set Ceph OSDs to `noout` during the upgrade to avoid automatic rebalancing:
|
||||
```bash
|
||||
ceph osd set noout
|
||||
```
|
||||
|
||||
### Update Proxmox VE to 8.4.14
|
||||
|
||||
The plan is simple, for all nodes, one at a time:
|
||||
|
||||
1. Enable the maintenance mode
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance enable $(hostname)
|
||||
```
|
||||
|
||||
2. Update the node
|
||||
```bash
|
||||
apt-get update
|
||||
apt-get dist-upgrade -y
|
||||
```
|
||||
|
||||
At the end of the update, I'm invited to remove a bootloader, which I execute:
|
||||
```plaintext
|
||||
Removable bootloader found at '/boot/efi/EFI/BOOT/BOOTX64.efi', but GRUB packages not set up to update it!
|
||||
Run the following command:
|
||||
|
||||
echo 'grub-efi-amd64 grub2/force_efi_extra_removable boolean true' | debconf-set-selections -v -u
|
||||
|
||||
Then reinstall GRUB with 'apt install --reinstall grub-efi-amd64'
|
||||
```
|
||||
|
||||
3. Restart the machine
|
||||
```bash
|
||||
reboot
|
||||
```
|
||||
|
||||
4. Disable the maintenance node
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance disable $(hostname)
|
||||
```
|
||||
|
||||
Between each node, I wait for the Ceph status to be clean, without warnings.
|
||||
|
||||
✅ At the end, the Proxmox VE cluster is updated to `8.4.14`
|
||||
|
||||
### Upgrade Ceph from Reef to Squid
|
||||
|
||||
I can now move on into the Ceph upgrade, the Proxmox documentation for that procedure is [here](https://pve.proxmox.com/wiki/Ceph_Reef_to_Squid).
|
||||
|
||||
Update Ceph package sources on every node:
|
||||
```bash
|
||||
sed -i 's/reef/squid/' /etc/apt/sources.list.d/ceph.list
|
||||
```
|
||||
|
||||
Upgrade the Ceph packages:
|
||||
```
|
||||
apt update
|
||||
apt full-upgrade -y
|
||||
```
|
||||
|
||||
After the upgrade on the first node, the Ceph version now shows `19.2.3`, I can see my OSDs appear as outdated, the monitors need either an upgrade or a restart:
|
||||

|
||||
|
||||
I carry on and upgrade the packages on the 2 other nodes.
|
||||
|
||||
I have a monitor on each node, so I have to restart each monitor, one node at a time:
|
||||
```bash
|
||||
systemctl restart ceph-mon.target
|
||||
```
|
||||
|
||||
I verify the Ceph status between each restart:
|
||||
```bash
|
||||
ceph status
|
||||
```
|
||||
|
||||
Once all monitors are restarted, they report the latest version, with `ceph mon dump`:
|
||||
- Before: `min_mon_release 18 (reef)`
|
||||
- After: `min_mon_release 19 (squid)`
|
||||
|
||||
Now I can restart the OSDs, still one node at a time. In my setup, I have one OSD per node:
|
||||
```bash
|
||||
systemctl restart ceph-osd.target
|
||||
```
|
||||
|
||||
I monitor the Ceph status with the Proxmox WebGUI. After the restart, it is showing some fancy colors. I'm just waiting for the PGs to be back to green, it takes less than a minute:
|
||||

|
||||
|
||||
A warning shows up: `HEALTH_WARN: all OSDs are running squid or later but require_osd_release < squid`
|
||||
|
||||
Now all my OSDs are running Squid, I can set the minimum version to it:
|
||||
```bash
|
||||
ceph osd require-osd-release squid
|
||||
```
|
||||
|
||||
ℹ️ I'm not currently using CephFS so I don't have to care about the MDS (MetaData Server) daemon.
|
||||
|
||||
✅ The Ceph cluster has been successfully upgraded to Squid (`19.2.3`).
|
||||
|
||||
---
|
||||
## Checks
|
||||
|
||||
The prerequisites to upgrade the cluster to Proxmox VE 9 are now complete. Am I ready to upgrade? Not yet.
|
||||
|
||||
A small checklist program named **`pve8to9`** is included in the latest Proxmox VE 8.4 packages. The program will provide hints and warnings about potential issues before, during and after the upgrade process. Pretty handy isn't it?
|
||||
|
||||
Running the tool the first time give me some insights on what I need to do. The script checks a number of parameters, grouped by theme. For example, this is the Virtual Guest section:
|
||||
```plaintext
|
||||
= VIRTUAL GUEST CHECKS =
|
||||
|
||||
INFO: Checking for running guests..
|
||||
WARN: 1 running guest(s) detected - consider migrating or stopping them.
|
||||
INFO: Checking if LXCFS is running with FUSE3 library, if already upgraded..
|
||||
SKIP: not yet upgraded, no need to check the FUSE library version LXCFS uses
|
||||
INFO: Checking for VirtIO devices that would change their MTU...
|
||||
PASS: All guest config descriptions fit in the new limit of 8 KiB
|
||||
INFO: Checking container configs for deprecated lxc.cgroup entries
|
||||
PASS: No legacy 'lxc.cgroup' keys found.
|
||||
INFO: Checking VM configurations for outdated machine versions
|
||||
PASS: All VM machine versions are recent enough
|
||||
```
|
||||
|
||||
At the end, you have the summary. The goal is to address as many `FAILURES` and `WARNINGS` as possible:
|
||||
```plaintext
|
||||
= SUMMARY =
|
||||
|
||||
TOTAL: 57
|
||||
PASSED: 43
|
||||
SKIPPED: 7
|
||||
WARNINGS: 2
|
||||
FAILURES: 2
|
||||
```
|
||||
|
||||
Let's review the problems it found:
|
||||
|
||||
```
|
||||
FAIL: 1 custom role(s) use the to-be-dropped 'VM.Monitor' privilege and need to be adapted after the upgrade
|
||||
```
|
||||
|
||||
Some time ago, in order to use Terraform with my Proxmox cluster, I created a dedicated role. This was detailed in that [post]({{< ref "post/3-terraform-create-vm-proxmox" >}}).
|
||||
|
||||
This role is using the `VM.Monitor` privilege, which is removed in Proxmox VE 9. Instead, new privileges under `VM.GuestAgent.*` exist. So I remove this one and I'll add those once the cluster have been upgraded.
|
||||
|
||||
```
|
||||
FAIL: systemd-boot meta-package installed. This will cause problems on upgrades of other boot-related packages. Remove 'systemd-boot' See https://pve.proxmox.com/wiki/Upgrade_from_8_to_9#sd-boot-warning for more information.
|
||||
```
|
||||
|
||||
Proxmox VE usually uses `systemd-boot` for booting only in some configurations which are managed by `proxmox-boot-tool`, the meta-package `systemd-boot` should be removed. The package was automatically shipped for systems installed from the PVE 8.1 to PVE 8.4, as it contained `bootctl` in Bookworm.
|
||||
|
||||
If the `pve8to9` checklist script suggests it, the `systemd-boot` meta-package is safe to remove unless you manually installed it and are using `systemd-boot` as a bootloader:
|
||||
```bash
|
||||
apt remove systemd-boot -y
|
||||
```
|
||||
|
||||
|
||||
```
|
||||
WARN: 1 running guest(s) detected - consider migrating or stopping them.
|
||||
```
|
||||
|
||||
In HA setup, before updating a node, I put it in maintenance mode. This automatically moves the workload elsewhere. When this mode is disabled, the workload moves back to its previous location.
|
||||
|
||||
```
|
||||
WARN: The matching CPU microcode package 'amd64-microcode' could not be found! Consider installing it to receive the latest security and bug fixes for your CPU.
|
||||
Ensure you enable the 'non-free-firmware' component in the apt sources and run:
|
||||
apt install amd64-microcode
|
||||
```
|
||||
|
||||
It is recommended to install processor microcode for updates which can fix hardware bugs, improve performance, and enhance security features of the processor.
|
||||
|
||||
I add the `non-free-firmware` source to the current ones:
|
||||
```bash
|
||||
sed -i '/^deb /{/non-free-firmware/!s/$/ non-free-firmware/}' /etc/apt/sources.list
|
||||
```
|
||||
|
||||
Then install the `amd64-microcode` package:
|
||||
```bash
|
||||
apt update
|
||||
apt install amd64-microcode -y
|
||||
```
|
||||
|
||||
After these small adjustments, am I ready yet? Let's find out by relaunching the `pve8to9` script.
|
||||
|
||||
⚠️ Don't forget to run the `pve8to9` on all nodes to make sure everything is good.
|
||||
|
||||
---
|
||||
## Upgrade
|
||||
|
||||
🚀 Now everything is ready for the big move! Like I did for the minor update, I'll proceed one node at a time, keeping my VMs and CTs up and running.
|
||||
|
||||
### Set Maintenance Mode
|
||||
|
||||
First, I enter the node into maintenance mode. This will move existing workload on other nodes:
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance enable $(hostname)
|
||||
```
|
||||
|
||||
After issuing the command, I wait about one minute to give the resources the time to migrate.
|
||||
|
||||
### Change Source Repositories to Trixie
|
||||
|
||||
Since Debian Trixie, the `deb822` format is now available and recommended for sources. It is structured around key/value format. This offers better readability and security.
|
||||
|
||||
#### Debian Sources
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/debian.sources << EOF
|
||||
Types: deb deb-src
|
||||
URIs: http://deb.debian.org/debian/
|
||||
Suites: trixie trixie-updates
|
||||
Components: main contrib non-free-firmware
|
||||
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
|
||||
|
||||
Types: deb deb-src
|
||||
URIs: http://security.debian.org/debian-security/
|
||||
Suites: trixie-security
|
||||
Components: main contrib non-free-firmware
|
||||
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Proxmox Sources (without subscription)
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/proxmox.sources << EOF
|
||||
Types: deb
|
||||
URIs: http://download.proxmox.com/debian/pve
|
||||
Suites: trixie
|
||||
Components: pve-no-subscription
|
||||
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Ceph Squid Sources (without subscription)
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/ceph.sources << EOF
|
||||
Types: deb
|
||||
URIs: http://download.proxmox.com/debian/ceph-squid
|
||||
Suites: trixie
|
||||
Components: no-subscription
|
||||
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Remove Old Bookworm Source Lists
|
||||
|
||||
The lists for Debian Bookworm in the old format must be removed:
|
||||
```bash
|
||||
rm -f /etc/apt/sources.list{,.d/*.list}
|
||||
```
|
||||
|
||||
### Update the Configured `apt` Repositories
|
||||
|
||||
Refresh the repositories:
|
||||
```bash
|
||||
apt update
|
||||
```
|
||||
```plaintext
|
||||
Get:1 http://security.debian.org/debian-security trixie-security InRelease [43.4 kB]
|
||||
Get:2 http://deb.debian.org/debian trixie InRelease [140 kB]
|
||||
Get:3 http://download.proxmox.com/debian/ceph-squid trixie InRelease [2,736 B]
|
||||
Get:4 http://download.proxmox.com/debian/pve trixie InRelease [2,771 B]
|
||||
Get:5 http://deb.debian.org/debian trixie-updates InRelease [47.3 kB]
|
||||
Get:6 http://security.debian.org/debian-security trixie-security/main Sources [91.1 kB]
|
||||
Get:7 http://security.debian.org/debian-security trixie-security/non-free-firmware Sources [696 B]
|
||||
Get:8 http://security.debian.org/debian-security trixie-security/main amd64 Packages [69.0 kB]
|
||||
Get:9 http://security.debian.org/debian-security trixie-security/main Translation-en [45.1 kB]
|
||||
Get:10 http://security.debian.org/debian-security trixie-security/non-free-firmware amd64 Packages [544 B]
|
||||
Get:11 http://security.debian.org/debian-security trixie-security/non-free-firmware Translation-en [352 B]
|
||||
Get:12 http://download.proxmox.com/debian/ceph-squid trixie/no-subscription amd64 Packages [33.2 kB]
|
||||
Get:13 http://deb.debian.org/debian trixie/main Sources [10.5 MB]
|
||||
Get:14 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages [241 kB]
|
||||
Get:15 http://deb.debian.org/debian trixie/non-free-firmware Sources [6,536 B]
|
||||
Get:16 http://deb.debian.org/debian trixie/contrib Sources [52.3 kB]
|
||||
Get:17 http://deb.debian.org/debian trixie/main amd64 Packages [9,669 kB]
|
||||
Get:18 http://deb.debian.org/debian trixie/main Translation-en [6,484 kB]
|
||||
Get:19 http://deb.debian.org/debian trixie/contrib amd64 Packages [53.8 kB]
|
||||
Get:20 http://deb.debian.org/debian trixie/contrib Translation-en [49.6 kB]
|
||||
Get:21 http://deb.debian.org/debian trixie/non-free-firmware amd64 Packages [6,868 B]
|
||||
Get:22 http://deb.debian.org/debian trixie/non-free-firmware Translation-en [4,704 B]
|
||||
Get:23 http://deb.debian.org/debian trixie-updates/main Sources [2,788 B]
|
||||
Get:24 http://deb.debian.org/debian trixie-updates/main amd64 Packages [5,412 B]
|
||||
Get:25 http://deb.debian.org/debian trixie-updates/main Translation-en [4,096 B]
|
||||
Fetched 27.6 MB in 3s (8,912 kB/s)
|
||||
Reading package lists... Done
|
||||
Building dependency tree... Done
|
||||
Reading state information... Done
|
||||
666 packages can be upgraded. Run 'apt list --upgradable' to see them.
|
||||
```
|
||||
|
||||
😈 666 packages, I'm doomed!
|
||||
|
||||
### Upgrade to Debian Trixie and Proxmox VE 9
|
||||
|
||||
Launch the upgrade:
|
||||
```bash
|
||||
apt-get dist-upgrade -y
|
||||
```
|
||||
|
||||
During the process , you will be prompted to approve changes to configuration files and some service restarts. You may also be shown the output of changes, you can simply exit there by pressing `q`:
|
||||
- `/etc/issue`: Proxmox VE will auto-generate this file on boot -> `No`
|
||||
- `/etc/lvm/lvm.conf`: Changes relevant for Proxmox VE will be updated -> `Yes`
|
||||
- `/etc/ssh/sshd_config`: Depending your setup -> `Inspect`
|
||||
- `/etc/default/grub`: Only if you changed it manually -> `Inspect`
|
||||
- `/etc/chrony/chrony.conf`: If you did not make extra changes yourself -> `Yes`
|
||||
|
||||
The upgrade took about 5 minutes, depending of the hardware.
|
||||
|
||||
At the end of the upgrade, restart the machine:
|
||||
```bash
|
||||
reboot
|
||||
```
|
||||
### Remove Maintenance Mode
|
||||
|
||||
Finally when the node (hopefully) comes back, you can disable the maintenance mode. The workload which was located on that machine will come back:
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance disable $(hostname)
|
||||
```
|
||||
|
||||
### Post-Upgrade Validation
|
||||
|
||||
- Check cluster communication:
|
||||
```bash
|
||||
pvecm status
|
||||
```
|
||||
|
||||
- Verify storage mounts points
|
||||
|
||||
- Check Ceph cluster health :
|
||||
```bash
|
||||
ceph status
|
||||
```
|
||||
|
||||
- Confirm VM operations, backups, and HA groups
|
||||
|
||||
HA groups have been removed at the profit of HA affinity rules. HA groups are automatically migrated to HA rules.
|
||||
|
||||
- Disable PVE Enterprise repository
|
||||
|
||||
If you don't use the `pve-enterprise` repo, you can disable it:
|
||||
```bash
|
||||
sed -i 's/^/#/' /etc/apt/sources.list.d/pve-enterprise.sources
|
||||
```
|
||||
|
||||
🔁 This node is now upgraded to Proxmox VE 9. Proceed to other nodes.
|
||||
|
||||
## Post Actions
|
||||
|
||||
Once the whole cluster has been upgraded, proceed to post actions:
|
||||
|
||||
- Remove the Ceph cluster `noout` flag:
|
||||
```bash
|
||||
ceph osd unset noout
|
||||
```
|
||||
|
||||
- Recreate PCI passthrough mapping
|
||||
|
||||
For the VM which I removed the host mapping at the beginning of the procedure, I can now recreate the mapping.
|
||||
|
||||
- Add privileges for the Terraform role
|
||||
|
||||
During the check phase, I was advised to remove the privilege `VM.Monitor` from my custom role for Terraform. Now that new privileges have been added with Proxmox VE 9, I can assign them to that role:
|
||||
- VM.GuestAgent.Audit
|
||||
- VM.GuestAgent.FileRead
|
||||
- VM.GuestAgent.FileWrite
|
||||
- VM.GuestAgent.FileSystemMgmt
|
||||
- VM.GuestAgent.Unrestricted
|
||||
|
||||
## Conclusion
|
||||
|
||||
🎉My Proxmox VE cluster is now is version 9!
|
||||
|
||||
The upgrade process was pretty smooth, without any downtime for my resources.
|
||||
|
||||
Now I have access to HA affinity rules, which I was needing for my OPNsense cluster.
|
||||
|
||||
As you could observe, I'm not maintaining my node up to date quite often. I might automate this next time, to keep them updated without any effort.
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user