Compare commits
52 Commits
feature-re
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
d7ab8cdab3 | ||
|
|
bc44f93c3f | ||
|
|
61309cfb4e | ||
|
|
50cc6c195a | ||
|
|
c27bd9f906 | ||
|
|
e495593cc1 | ||
|
|
dc9c6d7164 | ||
|
|
ca68e911eb | ||
|
|
721e911258 | ||
|
|
09ed5ade30 | ||
|
|
ddb46f8aa3 | ||
|
|
65af7bcee5 | ||
|
|
271fe23e23 | ||
|
|
5a2a530d32 | ||
|
|
a88e3158c5 | ||
|
|
4cd7c76c0a | ||
|
|
67d23c90ac | ||
|
|
36f1374128 | ||
|
|
b801726508 | ||
|
|
c87b9f4bc9 | ||
|
|
08a8b65a1d | ||
|
|
ce1249a924 | ||
|
|
545720c4c0 | ||
|
|
2119cbf695 | ||
|
|
ce2288dabe | ||
|
|
a1fa2c0d53 | ||
|
|
bacb22987a | ||
|
|
4a20a913e0 | ||
|
|
c5c6d9b91d | ||
|
|
236e9fa668 | ||
|
|
35b8a8596f | ||
|
|
d73aaad0b4 | ||
|
|
c43d2af086 | ||
|
|
011dbc7293 | ||
|
|
37025b683a | ||
|
|
ef84e229b2 | ||
|
|
0598cf2a5f | ||
|
|
e851ee9bd9 | ||
|
|
3389de98d8 | ||
|
|
aa9077a47b | ||
|
|
d888220239 | ||
|
|
44ddcb6589 | ||
|
|
d3ad691387 | ||
|
|
57db4726d7 | ||
|
|
58856f4668 | ||
|
|
62833b288a | ||
|
|
739763bc9c | ||
|
|
8482223f48 | ||
|
|
302c6d1a46 | ||
|
|
fbafb580a0 | ||
|
|
8ed82a75ab | ||
|
|
b6b8083adb |
@@ -1,137 +0,0 @@
|
||||
---
|
||||
slug:
|
||||
title: Template
|
||||
description:
|
||||
date:
|
||||
draft: true
|
||||
tags:
|
||||
- opnsense
|
||||
- high-availability
|
||||
- proxmox
|
||||
categories:
|
||||
---
|
||||
|
||||
## Intro
|
||||
|
||||
In my previous [post]({{< ref "post/12-opnsense-virtualization-highly-available" >}}), I've set up a PoC to validate the possibility to create a cluster of 2 **OPNsense** VMs in **Proxmox VE** and make the firewall highly available.
|
||||
|
||||
This time, I will cover the creation of my future OPNsense cluster from scratch, plan the cut over and finally migrate from my current physical box.
|
||||
|
||||
---
|
||||
## Build the Foundation
|
||||
|
||||
For the real thing, I'll have to connect the WAN, coming from my ISP box, to my main switch. For that I have to add a VLAN to transport this flow to my Proxmox nodes.
|
||||
|
||||
### UniFi
|
||||
|
||||
The first thing I do is to configure my layer 2 network which is managed by UniFi. There I need to create two VLANs:
|
||||
- *WAN* (20): transport the WAN between my ISP box and my Proxmox nodes.
|
||||
- *pfSync* (44), communication between my OPNsense nodes.
|
||||
|
||||
In the UniFi controller, in `Settings` > `Networks`, I add a `New Virtual Network`. I name it `WAN` and give it the VLAN ID 20:
|
||||

|
||||
|
||||
I do the same thing again for the `pfSync` VLAN with the VLAN ID 44.
|
||||
|
||||
I will plug my ISP box on the port 15 of my switch, which is disabled for now. I set it as active, set the native VLAN on the newly created one `WAN (20)` and disable trunking:
|
||||

|
||||
|
||||
Once this setting applied, I make sure that only the ports where are connected my Proxmox nodes propagate these VLAN on their trunk.
|
||||
|
||||
We are done with UniFi configuration.
|
||||
|
||||
### Proxmox SDN
|
||||
|
||||
Now that the VLAN can reach my nodes, I want to handle it in the Proxmox SDN.
|
||||
|
||||
In `Datacenter` > `SDN` > `VNets`, I create a new VNet, name it `vlan20` to follow my own naming convention, give it the *WAN* alias and use the tag (ID) 20:
|
||||

|
||||
|
||||
I also create the `vlan44` for the *pfSync* VLAN, then I apply this configuration and we are done with the SDN.
|
||||
|
||||
---
|
||||
## Create the VMs
|
||||
|
||||
Now that the VLAN configuration is done, I can start buiding the virtual machines on Proxmox.
|
||||
|
||||
The first VM is named `cerbere-head1` (I didn't tell you? My current firewall is named `cerbere`, it makes even more sense now!) Here are the settings:
|
||||
- OS type: Linux
|
||||
- Machine type: `q35`
|
||||
- BIOS: `OVMF (UEFI)`
|
||||
- Disk: 20 GiB on Ceph storage
|
||||
- CPU/RAM: 2 vCPU, 4 GiB RAM
|
||||
- NICs:
|
||||
1. `vmbr0` (*Mgmt*)
|
||||
2. `vlan20` (*WAN*)
|
||||
3. `vlan13` *(User)*
|
||||
4. `vlan37` *(IoT)*
|
||||
5. `vlan44` *(pfSync)*
|
||||
6. `vlan55` *(DMZ)*
|
||||
7. `vlan66` *(Lab)*
|
||||

|
||||
|
||||
ℹ️ Now I clone that VM to create `cerbere-head2`, then I proceed with OPNsense installation. I don't want to go into much details about OPNsense installation, I already documented it in the previous [post]({{< ref "post/12-opnsense-virtualization-highly-available" >}}).
|
||||
|
||||
After the installation of both OPNsense instances, I give to each of them their IP in the *Mgmt* network:
|
||||
- `cerbere-head1`: `192.168.88.2/24`
|
||||
- `cerbere-head2`: `192.168.88.3/24`
|
||||
|
||||
While these routers are not managing the networks, I give them my current OPNsense routeur as gateway (`192.168.88.1`) to able to reach them from my PC in another VLAN.
|
||||
|
||||
---
|
||||
## Configure OPNsense
|
||||
|
||||
|
||||
|
||||
|
||||
## TODO
|
||||
|
||||
HA in proxmox
|
||||
Make sure VM start at proxmox boot
|
||||
Check conso Watt average
|
||||
Check temp average
|
||||
## Switch
|
||||
|
||||
Backup OPNsense box
|
||||
Disable DHCP on OPNsene box
|
||||
Change OPNsense box IPs
|
||||
|
||||
Remove GW on VM
|
||||
Configure DHCP on both instance
|
||||
Enable DHCP on VM
|
||||
Change VIP on VM
|
||||
Replicate configuration on VM
|
||||
Unplug OPNsense box WAN
|
||||
Plug WAN on port 15
|
||||
|
||||
|
||||
|
||||
## Verify
|
||||
|
||||
Ping VIP
|
||||
Vérifier interface
|
||||
tests locaux (ssh, ping)
|
||||
|
||||
Basic (dhcp, dns, internet)
|
||||
Firewall
|
||||
All sites
|
||||
mDNS (chromecast)
|
||||
VPN
|
||||
TV
|
||||
|
||||
Vérifier tous les devices
|
||||
|
||||
DNS blocklist
|
||||
|
||||
Check load (ram, cpu)
|
||||
Failover
|
||||
|
||||
Test proxmox full shutdown
|
||||
|
||||
## Clean Up
|
||||
|
||||
Shutdown OPNsense
|
||||
Check watt
|
||||
Check temp
|
||||
|
||||
## Rollback
|
||||
424
content/post/14-proxmox-cluster-upgrade-8-to-9-ceph.fr.md
Normal file
@@ -0,0 +1,424 @@
|
||||
---
|
||||
slug: proxmox-cluster-upgrade-8-to-9-ceph
|
||||
title: Mise à niveau de mon cluster Proxmox VE HA 3 nœuds de 8 vers 9 basé sur Ceph
|
||||
description: Mise à niveau pas à pas de mon cluster Proxmox VE 3 nœuds en haute disponibilité, de 8 vers 9, basé sur Ceph, sans aucune interruption.
|
||||
date: 2025-11-04
|
||||
draft: false
|
||||
tags:
|
||||
- proxmox
|
||||
- high-availability
|
||||
- ceph
|
||||
categories:
|
||||
- homelab
|
||||
---
|
||||
|
||||
## Intro
|
||||
|
||||
Mon **cluster Proxmox VE** a presque un an maintenant, et je n’ai pas tenu les nœuds complètement à jour. Il est temps de m’en occuper et de le passer en Proxmox VE **9**.
|
||||
|
||||
Je recherche principalement les nouvelles règles d’affinité HA, mais voici les changements utiles apportés par cette version :
|
||||
- Debian 13 "Trixie".
|
||||
- Snapshots pour le stockage LVM partagé thick-provisioned.
|
||||
- Fonctionnalité SDN fabrics.
|
||||
- Interface mobile améliorée.
|
||||
- Règles d’affinité dans le cluster HA.
|
||||
|
||||
Le cluster est composée de 3 nœuds, hautement disponible, avec une configuration hyper‑convergée, utilisant Ceph pour le stockage distribué.
|
||||
|
||||
Dans cet article, je décris les étapes de mise à niveau de mon cluster Proxmox VE, de la version 8 vers 9, tout en gardant les ressources actives. [Documentation officielle](https://pve.proxmox.com/wiki/Upgrade_from_8_to_9).
|
||||
|
||||
---
|
||||
## Prérequis
|
||||
|
||||
Avant de se lancer dans la mise à niveau, passons en revue les prérequis :
|
||||
|
||||
1. Tous les nœuds mis à jour vers la dernière version Proxmox VE `8.4`.
|
||||
2. Cluster Ceph mis à niveau vers Squid (`19.2`).
|
||||
3. Proxmox Backup Server mis à jour vers la version 4.
|
||||
4. Accès fiable au nœud.
|
||||
5. Cluster en bonne santé.
|
||||
6. Sauvegarde de toutes les VM et CT.
|
||||
7. Au moins 5 Go libres sur `/`.
|
||||
|
||||
Remarques sur mon environnement :
|
||||
|
||||
- Les nœuds PVE sont en `8.3.2`, donc une mise à jour mineure vers 8.4 est d’abord requise.
|
||||
- Ceph tourne sous Reef (`18.2.4`) et sera mis à niveau vers Squid après PVE 8.4.
|
||||
- Je n’utilise pas PBS dans mon homelab, donc je peux sauter cette étape.
|
||||
- J’ai plus de 10 Go disponibles sur `/` sur mes nœuds, c’est suffisant.
|
||||
- Je n’ai qu’un accès console SSH, si un nœud ne répond plus je pourrais avoir besoin d’un accès physique.
|
||||
- Une VM a un passthrough CPU (APU). Le passthrough empêche la migration à chaud, donc je supprime ce mapping avant la mise à niveau.
|
||||
- Mettre les OSD Ceph en `noout` pendant la mise à niveau pour éviter le rebalancing automatique :
|
||||
```bash
|
||||
ceph osd set noout
|
||||
```
|
||||
|
||||
### Mettre à Jour Proxmox VE vers 8.4.14
|
||||
|
||||
Le plan est simple, pour tous les nœuds, un par un :
|
||||
|
||||
1. Activer le mode maintenance
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance enable $(hostname)
|
||||
```
|
||||
|
||||
2. Mettre à jour le nœud
|
||||
```bash
|
||||
apt-get update
|
||||
apt-get dist-upgrade -y
|
||||
```
|
||||
|
||||
À la fin de la mise à jour, on me propose de retirer booloader, ce que j’exécute :
|
||||
```plaintext
|
||||
Removable bootloader found at '/boot/efi/EFI/BOOT/BOOTX64.efi', but GRUB packages not set up to update it!
|
||||
Run the following command:
|
||||
|
||||
echo 'grub-efi-amd64 grub2/force_efi_extra_removable boolean true' | debconf-set-selections -v -u
|
||||
|
||||
Then reinstall GRUB with 'apt install --reinstall grub-efi-amd64'
|
||||
```
|
||||
|
||||
3. Redémarrer la machine
|
||||
```bash
|
||||
reboot
|
||||
```
|
||||
|
||||
4. Désactiver le mode maintenance
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance disable $(hostname)
|
||||
```
|
||||
|
||||
Entre chaque nœud, j’attends que le statut Ceph soit clean, sans alertes.
|
||||
|
||||
✅ À la fin, le cluster Proxmox VE est mis à jour vers `8.4.14`
|
||||
|
||||
### Mettre à Niveau Ceph de Reef vers Squid
|
||||
|
||||
Je peux maintenant passer à la mise à niveau de Ceph, la documentation Proxmox pour cette procédure est [ici](https://pve.proxmox.com/wiki/Ceph_Reef_to_Squid).
|
||||
|
||||
Mettre à jour les sources de paquets Ceph sur chaque nœud :
|
||||
```bash
|
||||
sed -i 's/reef/squid/' /etc/apt/sources.list.d/ceph.list
|
||||
```
|
||||
|
||||
Mettre à niveau les paquets Ceph :
|
||||
```
|
||||
apt update
|
||||
apt full-upgrade -y
|
||||
```
|
||||
|
||||
Après la mise à niveau sur le premier nœud, la version Ceph affiche maintenant `19.2.3`, je peux voir mes OSD apparaître comme obsolètes, les moniteurs nécessitent soit une mise à niveau soit un redémarrage :
|
||||

|
||||
|
||||
Je poursuis et mets à niveau les paquets sur les 2 autres nœuds.
|
||||
|
||||
J’ai un moniteur sur chaque nœud, donc je dois redémarrer chaque moniteur, un nœud à la fois :
|
||||
```bash
|
||||
systemctl restart ceph-mon.target
|
||||
```
|
||||
|
||||
Je vérifie le statut Ceph entre chaque redémarrage :
|
||||
```bash
|
||||
ceph status
|
||||
```
|
||||
|
||||
Une fois tous les moniteurs redémarrés, ils rapportent la dernière version, avec `ceph mon dump` :
|
||||
- Avant : `min_mon_release 18 (reef)`
|
||||
- Après : `min_mon_release 19 (squid)`
|
||||
|
||||
Je peux maintenant redémarrer les OSD, toujours un nœud à la fois. Dans ma configuration, j’ai un OSD par nœud :
|
||||
```bash
|
||||
systemctl restart ceph-osd.target
|
||||
```
|
||||
|
||||
Je surveille le statut Ceph avec la WebGUI Proxmox. Après le redémarrage, elle affiche quelques couleurs fancy. J’attends juste que les PG redeviennent verts, cela prend moins d’une minute :
|
||||

|
||||
|
||||
Un avertissement apparaît : `HEALTH_WARN: all OSDs are running squid or later but require_osd_release < squid`
|
||||
|
||||
Maintenant tous mes OSD tournent sous Squid, je peux fixer la version minimum à celle‑ci :
|
||||
```bash
|
||||
ceph osd require-osd-release squid
|
||||
```
|
||||
|
||||
ℹ️ Je n’utilise pas actuellement CephFS donc je n’ai pas à me soucier du daemon MDS (MetaData Server).
|
||||
|
||||
✅ Le cluster Ceph a été mis à niveau avec succès vers Squid (`19.2.3`).
|
||||
|
||||
---
|
||||
## Vérifications
|
||||
|
||||
Les prérequis pour mettre à niveau le cluster vers Proxmox VE 9 sont maintenant complets. Suis‑je prêt à mettre à niveau ? Pas encore.
|
||||
|
||||
Un petit programme de checklist nommé **`pve8to9`** est inclus dans les derniers paquets Proxmox VE 8.4. Le programme fournit des indices et des alertes sur les problèmes potentiels avant, pendant et après la mise à niveau. Pratique non ?
|
||||
|
||||
Lancer l’outil la première fois me donne des indications sur ce que je dois faire. Le script vérifie un certain nombre de paramètres, regroupés par thème. Par exemple, voici la section sur les Virtual Guest :
|
||||
```plaintext
|
||||
= VIRTUAL GUEST CHECKS =
|
||||
|
||||
INFO: Checking for running guests..
|
||||
WARN: 1 running guest(s) detected - consider migrating or stopping them.
|
||||
INFO: Checking if LXCFS is running with FUSE3 library, if already upgraded..
|
||||
SKIP: not yet upgraded, no need to check the FUSE library version LXCFS uses
|
||||
INFO: Checking for VirtIO devices that would change their MTU...
|
||||
PASS: All guest config descriptions fit in the new limit of 8 KiB
|
||||
INFO: Checking container configs for deprecated lxc.cgroup entries
|
||||
PASS: No legacy 'lxc.cgroup' keys found.
|
||||
INFO: Checking VM configurations for outdated machine versions
|
||||
PASS: All VM machine versions are recent enough
|
||||
```
|
||||
|
||||
À la fin, vous avez le résumé. L’objectif est de corriger autant de `FAILURES` et `WARNINGS` que possible :
|
||||
```plaintext
|
||||
= SUMMARY =
|
||||
|
||||
TOTAL: 57
|
||||
PASSED: 43
|
||||
SKIPPED: 7
|
||||
WARNINGS: 2
|
||||
FAILURES: 2
|
||||
```
|
||||
|
||||
Passons en revue les problèmes qu’il a trouvés :
|
||||
|
||||
```
|
||||
FAIL: 1 custom role(s) use the to-be-dropped 'VM.Monitor' privilege and need to be adapted after the upgrade
|
||||
```
|
||||
|
||||
Il y a quelque temps, pour utiliser Terraform avec mon cluster Proxmox, j'ai créé un rôle dédié. C'était détaillé dans cet [article]({{< ref "post/3-terraform-create-vm-proxmox" >}}).
|
||||
|
||||
Ce rôle utilise le privilège `VM.Monitor`, qui a été supprimé dans Proxmox VE 9. De nouveaux privilèges, sous `VM.GuestAgent.*`, existent à la place. Je supprime donc celui-ci et j'ajouterai les nouveaux une fois le cluster mis à niveau.
|
||||
|
||||
```
|
||||
FAIL: systemd-boot meta-package installed. This will cause problems on upgrades of other boot-related packages. Remove 'systemd-boot' See https://pve.proxmox.com/wiki/Upgrade_from_8_to_9#sd-boot-warning for more information.
|
||||
```
|
||||
|
||||
Proxmox VE utilise généralement `systemd-boot` pour le démarrage uniquement dans certaines configurations gérées par proxmox-boot-tool. Le méta-paquet `systemd-boot` doit être supprimé. Ce paquet était automatiquement installé sur les systèmes de PVE 8.1 à 8.4, car il contenait `bootctl` dans Bookworm.
|
||||
|
||||
Si le script de la checklist pve8to9 le suggère, vous pouvez supprimer le méta-paquet `systemd-boot` sans risque, sauf si vous l'avez installé manuellement et que vous utilisez `systemd-boot` comme bootloader :
|
||||
```bash
|
||||
apt remove systemd-boot -y
|
||||
```
|
||||
|
||||
|
||||
```
|
||||
WARN: 1 running guest(s) detected - consider migrating or stopping them.
|
||||
```
|
||||
|
||||
Dans une configuration HA, avant de mettre à jour un nœud, je le mets en mode maintenance. Cela déplace automatiquement les ressources ailleurs. Quand ce mode est désactivé, la machine revient à son emplacement précédent.
|
||||
|
||||
```
|
||||
WARN: The matching CPU microcode package 'amd64-microcode' could not be found! Consider installing it to receive the latest security and bug fixes for your CPU.
|
||||
Ensure you enable the 'non-free-firmware' component in the apt sources and run:
|
||||
apt install amd64-microcode
|
||||
```
|
||||
|
||||
Il est recommandé d’installer le microcode processeur pour les mises à jour qui peuvent corriger des bogues matériels, améliorer les performances et renforcer la sécurité du processeur.
|
||||
|
||||
J’ajoute la source `non-free-firmware` aux sources actuelles :
|
||||
```bash
|
||||
sed -i '/^deb /{/non-free-firmware/!s/$/ non-free-firmware/}' /etc/apt/sources.list
|
||||
```
|
||||
|
||||
Puis installe le paquet `amd64-microcode` :
|
||||
```bash
|
||||
apt update
|
||||
apt install amd64-microcode -y
|
||||
```
|
||||
|
||||
Après ces petits ajustements, suis‑je prêt ? Vérifions en relançant le script `pve8to9`.
|
||||
|
||||
⚠️ N’oubliez pas de lancer `pve8to9` sur tous les nœuds pour vous assurer que tout est OK.
|
||||
|
||||
---
|
||||
## Mise à Niveau
|
||||
|
||||
🚀 Maintenant tout est prêt pour le grand saut ! Comme pour la mise à jour mineure, je procéderai nœud par nœud, en gardant mes VM et CT actives.
|
||||
|
||||
### Mettre le Mode Maintenance
|
||||
|
||||
D’abord, j’entre le nœud en mode maintenance. Cela déplacera la charge existante sur les autres nœuds :
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance enable $(hostname)
|
||||
```
|
||||
|
||||
Après avoir exécuté la commande, j’attends environ une minute pour laisser le temps aux ressources de migrer.
|
||||
|
||||
### Changer les Dépôts Sources vers Trixie
|
||||
|
||||
Depuis Debian Trixie, le format `deb822` est désormais disponible et recommandé pour les sources. Il est structuré autour d’un format clé/valeur. Cela offre une meilleure lisibilité et sécurité.
|
||||
|
||||
#### Sources Debian
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/debian.sources << EOF
|
||||
Types: deb deb-src
|
||||
URIs: http://deb.debian.org/debian/
|
||||
Suites: trixie trixie-updates
|
||||
Components: main contrib non-free-firmware
|
||||
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
|
||||
|
||||
Types: deb deb-src
|
||||
URIs: http://security.debian.org/debian-security/
|
||||
Suites: trixie-security
|
||||
Components: main contrib non-free-firmware
|
||||
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Sources Proxmox (sans subscription)
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/proxmox.sources << EOF
|
||||
Types: deb
|
||||
URIs: http://download.proxmox.com/debian/pve
|
||||
Suites: trixie
|
||||
Components: pve-no-subscription
|
||||
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Sources Ceph Squid (sans subscription)
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/ceph.sources << EOF
|
||||
Types: deb
|
||||
URIs: http://download.proxmox.com/debian/ceph-squid
|
||||
Suites: trixie
|
||||
Components: no-subscription
|
||||
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Supprimer les Anciennes Listes Bookworm
|
||||
|
||||
Les listes pour Debian Bookworm au format ancien doivent être supprimées :
|
||||
```bash
|
||||
rm -f /etc/apt/sources.list{,.d/*.list}
|
||||
```
|
||||
|
||||
### Mettre à Jour les Dépôts `apt` Configurés
|
||||
|
||||
Rafraîchir les dépôts :
|
||||
```bash
|
||||
apt update
|
||||
```
|
||||
```plaintext
|
||||
Get:1 http://security.debian.org/debian-security trixie-security InRelease [43.4 kB]
|
||||
Get:2 http://deb.debian.org/debian trixie InRelease [140 kB]
|
||||
Get:3 http://download.proxmox.com/debian/ceph-squid trixie InRelease [2,736 B]
|
||||
Get:4 http://download.proxmox.com/debian/pve trixie InRelease [2,771 B]
|
||||
Get:5 http://deb.debian.org/debian trixie-updates InRelease [47.3 kB]
|
||||
Get:6 http://security.debian.org/debian-security trixie-security/main Sources [91.1 kB]
|
||||
Get:7 http://security.debian.org/debian-security trixie-security/non-free-firmware Sources [696 B]
|
||||
Get:8 http://security.debian.org/debian-security trixie-security/main amd64 Packages [69.0 kB]
|
||||
Get:9 http://security.debian.org/debian-security trixie-security/main Translation-en [45.1 kB]
|
||||
Get:10 http://security.debian.org/debian-security trixie-security/non-free-firmware amd64 Packages [544 B]
|
||||
Get:11 http://security.debian.org/debian-security trixie-security/non-free-firmware Translation-en [352 B]
|
||||
Get:12 http://download.proxmox.com/debian/ceph-squid trixie/no-subscription amd64 Packages [33.2 kB]
|
||||
Get:13 http://deb.debian.org/debian trixie/main Sources [10.5 MB]
|
||||
Get:14 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages [241 kB]
|
||||
Get:15 http://deb.debian.org/debian trixie/non-free-firmware Sources [6,536 B]
|
||||
Get:16 http://deb.debian.org/debian trixie/contrib Sources [52.3 kB]
|
||||
Get:17 http://deb.debian.org/debian trixie/main amd64 Packages [9,669 kB]
|
||||
Get:18 http://deb.debian.org/debian trixie/main Translation-en [6,484 kB]
|
||||
Get:19 http://deb.debian.org/debian trixie/contrib amd64 Packages [53.8 kB]
|
||||
Get:20 http://deb.debian.org/debian trixie/contrib Translation-en [49.6 kB]
|
||||
Get:21 http://deb.debian.org/debian trixie/non-free-firmware amd64 Packages [6,868 B]
|
||||
Get:22 http://deb.debian.org/debian trixie/non-free-firmware Translation-en [4,704 B]
|
||||
Get:23 http://deb.debian.org/debian trixie-updates/main Sources [2,788 B]
|
||||
Get:24 http://deb.debian.org/debian trixie-updates/main amd64 Packages [5,412 B]
|
||||
Get:25 http://deb.debian.org/debian trixie-updates/main Translation-en [4,096 B]
|
||||
Fetched 27.6 MB in 3s (8,912 kB/s)
|
||||
Reading package lists... Done
|
||||
Building dependency tree... Done
|
||||
Reading state information... Done
|
||||
666 packages can be upgraded. Run 'apt list --upgradable' to see them.
|
||||
```
|
||||
|
||||
😈 666 paquets, je suis condamné !
|
||||
|
||||
### Mise à Niveau vers Debian Trixie et Proxmox VE 9
|
||||
|
||||
Lancer la mise à niveau :
|
||||
```bash
|
||||
apt-get dist-upgrade -y
|
||||
```
|
||||
|
||||
Pendant le processus, vous serez invité à approuver des changements de fichiers de configuration et certains redémarrages de services. Il se peut aussi que vous voyiez la sortie de certains changements, vous pouvez simplement en sortir en appuyant sur `q` :
|
||||
- `/etc/issue` : Proxmox VE régénérera automatiquement ce fichier au démarrage -> `No`
|
||||
- `/etc/lvm/lvm.conf` : Changements pertinents pour Proxmox VE seront mis à jour -> `Yes`
|
||||
- `/etc/ssh/sshd_config` : Selon votre configuration -> `Inspect`
|
||||
- `/etc/default/grub` : Seulement si vous l’avez modifié manuellement -> `Inspect`
|
||||
- `/etc/chrony/chrony.conf` : Si vous n’avez pas fait de modifications supplémentaires -> `Yes`
|
||||
|
||||
La mise à niveau a pris environ 5 minutes, selon le matériel.
|
||||
|
||||
À la fin de la mise à niveau, redémarrez la machine :
|
||||
```bash
|
||||
reboot
|
||||
```
|
||||
### Sortir du Mode Maintenance
|
||||
|
||||
Enfin, quand le nœud (espérons‑le) est revenu, vous pouvez désactiver le mode maintenance. La charge qui était localisée sur cette machine reviendra :
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance disable $(hostname)
|
||||
```
|
||||
|
||||
### Validation Après Mise à Niveau
|
||||
|
||||
- Vérifier la communication du cluster :
|
||||
```bash
|
||||
pvecm status
|
||||
```
|
||||
|
||||
- Vérifier les points de montage des stockages
|
||||
|
||||
- Vérifier la santé du cluster Ceph :
|
||||
```bash
|
||||
ceph status
|
||||
```
|
||||
|
||||
- Confirmer les opérations VM, les sauvegardes et les groupes HA
|
||||
|
||||
Les groupes HA ont été retirés au profit des règles d’affinité HA. Les groupes HA sont automatiquement migrés en règles HA.
|
||||
|
||||
- Désactiver le dépôt PVE Enterprise
|
||||
|
||||
Si vous n’utilisez pas le dépôt `pve-enterprise`, vous pouvez le désactiver : `` ```
|
||||
```bash
|
||||
sed -i 's/^/#/' /etc/apt/sources.list.d/pve-enterprise.sources
|
||||
```
|
||||
|
||||
🔁 Ce nœud est maintenant mis à niveau vers Proxmox VE 9. Procédez aux autres nœuds.
|
||||
|
||||
## Actions Postérieures
|
||||
|
||||
Une fois que tout le cluster a été mis à niveau, procédez aux actions postérieures :
|
||||
|
||||
- Supprimer le flag `noout` du cluster Ceph :
|
||||
```bash
|
||||
ceph osd unset noout
|
||||
```
|
||||
|
||||
- Recréer les mappings PCI passthrough
|
||||
|
||||
Pour la VM pour laquelle j’ai retiré le mapping hôte au début de la procédure, je peux maintenant recréer le mapping.
|
||||
|
||||
- Ajouter les privilèges pour le rôle Terraform
|
||||
|
||||
Pendant la phase de vérification, il m’a été conseillé de supprimer le privilège `VM.Monitor` de mon rôle personnalisé pour Terraform. Maintenant que de nouveaux privilèges ont été ajoutés avec Proxmox VE 9, je peux les attribuer à ce rôle :
|
||||
- VM.GuestAgent.Audit
|
||||
- VM.GuestAgent.FileRead
|
||||
- VM.GuestAgent.FileWrite
|
||||
- VM.GuestAgent.FileSystemMgmt
|
||||
- VM.GuestAgent.Unrestricted
|
||||
|
||||
## Conclusion
|
||||
|
||||
🎉 Mon cluster Proxmox VE est maintenant en version 9 !
|
||||
|
||||
Le processus de mise à niveau s’est déroulé assez tranquillement, sans aucune interruption pour mes ressources.
|
||||
|
||||
J’ai maintenant accès aux règles d’affinité HA, dont j’avais besoin pour mon cluster OPNsense.
|
||||
|
||||
Comme vous avez pu le constater, je ne maintiens pas mes nœuds à jour très souvent. Je pourrais automatiser cela la prochaine fois, pour les garder à jour sans effort.
|
||||
|
||||
|
||||
425
content/post/14-proxmox-cluster-upgrade-8-to-9-ceph.md
Normal file
@@ -0,0 +1,425 @@
|
||||
---
|
||||
slug: proxmox-cluster-upgrade-8-to-9-ceph
|
||||
title: Upgrading my 3-node Proxmox VE HA Cluster from 8 to 9 based on Ceph
|
||||
description: Step-by-step upgrade of my 3-node Proxmox VE highly available cluster from 8 to 9, based on Ceph distributed storage, without any downtime.
|
||||
date: 2025-11-04
|
||||
draft: false
|
||||
tags:
|
||||
- proxmox
|
||||
- high-availability
|
||||
- ceph
|
||||
categories:
|
||||
- homelab
|
||||
---
|
||||
|
||||
## Intro
|
||||
|
||||
My **Proxmox VE** cluster is almost one year old now, and I haven’t kept the nodes fully up to date. Time to address this and bump it to Proxmox VE **9**.
|
||||
|
||||
I'm mainly after the new HA affinity rules, but here the useful changes that this version brings:
|
||||
- Debian 13 "Trixie".
|
||||
- Snapshots for thick-provisioned LVM shared storage.
|
||||
- SDN fabrics feature.
|
||||
- Improved mobile UI.
|
||||
- Affinity rules in HA cluster.
|
||||
|
||||
The cluster is a three‑node, highly available, hyper‑converged setup using Ceph for distributed storage.
|
||||
|
||||
In this article, I'll walk through the upgrade steps for my Proxmox VE cluster, from 8 to 9, while keeping the resources up and running. [Official docs](https://pve.proxmox.com/wiki/Upgrade_from_8_to_9).
|
||||
|
||||
---
|
||||
## Prerequisites
|
||||
|
||||
Before jumping into the upgrade, let's review the prerequisites:
|
||||
|
||||
1. All nodes upgraded to the latest Proxmox VE `8.4`.
|
||||
2. Ceph cluster upgraded to Squid (`19.2`).
|
||||
3. Proxmox Backup Server upgraded to version 4.
|
||||
4. Reliable access to the node.
|
||||
5. Healthy cluster.
|
||||
6. Backup of all VMs and CTs.
|
||||
7. At least 5 GB free on `/`.
|
||||
|
||||
Notes about my environment:
|
||||
|
||||
- PVE nodes are on `8.3.2`, so a minor upgrade to 8.4 is required first.
|
||||
- Ceph is Reef (`18.2.4`) and will be upgraded to Squid after PVE 8.4.
|
||||
- I don’t use PBS in my homelab, so I can skip that step.
|
||||
- I have more than 10GB available on `/` on my nodes, this is fine.
|
||||
- I only have SSH console access, if a node becomes unresponsive I may need physical access.
|
||||
- One VM has a CPU passthrough (APU). Passthrough prevents live‑migration, so I remove that mapping prior to the upgrade.
|
||||
- Set Ceph OSDs to `noout` during the upgrade to avoid automatic rebalancing:
|
||||
```bash
|
||||
ceph osd set noout
|
||||
```
|
||||
|
||||
### Update Proxmox VE to 8.4.14
|
||||
|
||||
The plan is simple, for all nodes, one at a time:
|
||||
|
||||
1. Enable the maintenance mode
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance enable $(hostname)
|
||||
```
|
||||
|
||||
2. Update the node
|
||||
```bash
|
||||
apt-get update
|
||||
apt-get dist-upgrade -y
|
||||
```
|
||||
|
||||
At the end of the update, I'm invited to remove a bootloader, which I execute:
|
||||
```plaintext
|
||||
Removable bootloader found at '/boot/efi/EFI/BOOT/BOOTX64.efi', but GRUB packages not set up to update it!
|
||||
Run the following command:
|
||||
|
||||
echo 'grub-efi-amd64 grub2/force_efi_extra_removable boolean true' | debconf-set-selections -v -u
|
||||
|
||||
Then reinstall GRUB with 'apt install --reinstall grub-efi-amd64'
|
||||
```
|
||||
|
||||
3. Restart the machine
|
||||
```bash
|
||||
reboot
|
||||
```
|
||||
|
||||
4. Disable the maintenance node
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance disable $(hostname)
|
||||
```
|
||||
|
||||
Between each node, I wait for the Ceph status to be clean, without warnings.
|
||||
|
||||
✅ At the end, the Proxmox VE cluster is updated to `8.4.14`
|
||||
|
||||
### Upgrade Ceph from Reef to Squid
|
||||
|
||||
I can now move on into the Ceph upgrade, the Proxmox documentation for that procedure is [here](https://pve.proxmox.com/wiki/Ceph_Reef_to_Squid).
|
||||
|
||||
Update Ceph package sources on every node:
|
||||
```bash
|
||||
sed -i 's/reef/squid/' /etc/apt/sources.list.d/ceph.list
|
||||
```
|
||||
|
||||
Upgrade the Ceph packages:
|
||||
```
|
||||
apt update
|
||||
apt full-upgrade -y
|
||||
```
|
||||
|
||||
After the upgrade on the first node, the Ceph version now shows `19.2.3`, I can see my OSDs appear as outdated, the monitors need either an upgrade or a restart:
|
||||

|
||||
|
||||
I carry on and upgrade the packages on the 2 other nodes.
|
||||
|
||||
I have a monitor on each node, so I have to restart each monitor, one node at a time:
|
||||
```bash
|
||||
systemctl restart ceph-mon.target
|
||||
```
|
||||
|
||||
I verify the Ceph status between each restart:
|
||||
```bash
|
||||
ceph status
|
||||
```
|
||||
|
||||
Once all monitors are restarted, they report the latest version, with `ceph mon dump`:
|
||||
- Before: `min_mon_release 18 (reef)`
|
||||
- After: `min_mon_release 19 (squid)`
|
||||
|
||||
Now I can restart the OSDs, still one node at a time. In my setup, I have one OSD per node:
|
||||
```bash
|
||||
systemctl restart ceph-osd.target
|
||||
```
|
||||
|
||||
I monitor the Ceph status with the Proxmox WebGUI. After the restart, it is showing some fancy colors. I'm just waiting for the PGs to be back to green, it takes less than a minute:
|
||||

|
||||
|
||||
A warning shows up: `HEALTH_WARN: all OSDs are running squid or later but require_osd_release < squid`
|
||||
|
||||
Now all my OSDs are running Squid, I can set the minimum version to it:
|
||||
```bash
|
||||
ceph osd require-osd-release squid
|
||||
```
|
||||
|
||||
ℹ️ I'm not currently using CephFS so I don't have to care about the MDS (MetaData Server) daemon.
|
||||
|
||||
✅ The Ceph cluster has been successfully upgraded to Squid (`19.2.3`).
|
||||
|
||||
---
|
||||
## Checks
|
||||
|
||||
The prerequisites to upgrade the cluster to Proxmox VE 9 are now complete. Am I ready to upgrade? Not yet.
|
||||
|
||||
A small checklist program named **`pve8to9`** is included in the latest Proxmox VE 8.4 packages. The program will provide hints and warnings about potential issues before, during and after the upgrade process. Pretty handy isn't it?
|
||||
|
||||
Running the tool the first time give me some insights on what I need to do. The script checks a number of parameters, grouped by theme. For example, this is the Virtual Guest section:
|
||||
```plaintext
|
||||
= VIRTUAL GUEST CHECKS =
|
||||
|
||||
INFO: Checking for running guests..
|
||||
WARN: 1 running guest(s) detected - consider migrating or stopping them.
|
||||
INFO: Checking if LXCFS is running with FUSE3 library, if already upgraded..
|
||||
SKIP: not yet upgraded, no need to check the FUSE library version LXCFS uses
|
||||
INFO: Checking for VirtIO devices that would change their MTU...
|
||||
PASS: All guest config descriptions fit in the new limit of 8 KiB
|
||||
INFO: Checking container configs for deprecated lxc.cgroup entries
|
||||
PASS: No legacy 'lxc.cgroup' keys found.
|
||||
INFO: Checking VM configurations for outdated machine versions
|
||||
PASS: All VM machine versions are recent enough
|
||||
```
|
||||
|
||||
At the end, you have the summary. The goal is to address as many `FAILURES` and `WARNINGS` as possible:
|
||||
```plaintext
|
||||
= SUMMARY =
|
||||
|
||||
TOTAL: 57
|
||||
PASSED: 43
|
||||
SKIPPED: 7
|
||||
WARNINGS: 2
|
||||
FAILURES: 2
|
||||
```
|
||||
|
||||
Let's review the problems it found:
|
||||
|
||||
```
|
||||
FAIL: 1 custom role(s) use the to-be-dropped 'VM.Monitor' privilege and need to be adapted after the upgrade
|
||||
```
|
||||
|
||||
Some time ago, in order to use Terraform with my Proxmox cluster, I created a dedicated role. This was detailed in that [post]({{< ref "post/3-terraform-create-vm-proxmox" >}}).
|
||||
|
||||
This role is using the `VM.Monitor` privilege, which is removed in Proxmox VE 9. Instead, new privileges under `VM.GuestAgent.*` exist. So I remove this one and I'll add those once the cluster have been upgraded.
|
||||
|
||||
```
|
||||
FAIL: systemd-boot meta-package installed. This will cause problems on upgrades of other boot-related packages. Remove 'systemd-boot' See https://pve.proxmox.com/wiki/Upgrade_from_8_to_9#sd-boot-warning for more information.
|
||||
```
|
||||
|
||||
Proxmox VE usually uses `systemd-boot` for booting only in some configurations which are managed by `proxmox-boot-tool`, the meta-package `systemd-boot` should be removed. The package was automatically shipped for systems installed from the PVE 8.1 to PVE 8.4, as it contained `bootctl` in Bookworm.
|
||||
|
||||
If the `pve8to9` checklist script suggests it, the `systemd-boot` meta-package is safe to remove unless you manually installed it and are using `systemd-boot` as a bootloader:
|
||||
```bash
|
||||
apt remove systemd-boot -y
|
||||
```
|
||||
|
||||
|
||||
```
|
||||
WARN: 1 running guest(s) detected - consider migrating or stopping them.
|
||||
```
|
||||
|
||||
In HA setup, before updating a node, I put it in maintenance mode. This automatically moves the workload elsewhere. When this mode is disabled, the workload moves back to its previous location.
|
||||
|
||||
```
|
||||
WARN: The matching CPU microcode package 'amd64-microcode' could not be found! Consider installing it to receive the latest security and bug fixes for your CPU.
|
||||
Ensure you enable the 'non-free-firmware' component in the apt sources and run:
|
||||
apt install amd64-microcode
|
||||
```
|
||||
|
||||
It is recommended to install processor microcode for updates which can fix hardware bugs, improve performance, and enhance security features of the processor.
|
||||
|
||||
I add the `non-free-firmware` source to the current ones:
|
||||
```bash
|
||||
sed -i '/^deb /{/non-free-firmware/!s/$/ non-free-firmware/}' /etc/apt/sources.list
|
||||
```
|
||||
|
||||
Then install the `amd64-microcode` package:
|
||||
```bash
|
||||
apt update
|
||||
apt install amd64-microcode -y
|
||||
```
|
||||
|
||||
After these small adjustments, am I ready yet? Let's find out by relaunching the `pve8to9` script.
|
||||
|
||||
⚠️ Don't forget to run the `pve8to9` on all nodes to make sure everything is good.
|
||||
|
||||
---
|
||||
## Upgrade
|
||||
|
||||
🚀 Now everything is ready for the big move! Like I did for the minor update, I'll proceed one node at a time, keeping my VMs and CTs up and running.
|
||||
|
||||
### Set Maintenance Mode
|
||||
|
||||
First, I enter the node into maintenance mode. This will move existing workload on other nodes:
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance enable $(hostname)
|
||||
```
|
||||
|
||||
After issuing the command, I wait about one minute to give the resources the time to migrate.
|
||||
|
||||
### Change Source Repositories to Trixie
|
||||
|
||||
Since Debian Trixie, the `deb822` format is now available and recommended for sources. It is structured around key/value format. This offers better readability and security.
|
||||
|
||||
#### Debian Sources
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/debian.sources << EOF
|
||||
Types: deb deb-src
|
||||
URIs: http://deb.debian.org/debian/
|
||||
Suites: trixie trixie-updates
|
||||
Components: main contrib non-free-firmware
|
||||
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
|
||||
|
||||
Types: deb deb-src
|
||||
URIs: http://security.debian.org/debian-security/
|
||||
Suites: trixie-security
|
||||
Components: main contrib non-free-firmware
|
||||
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Proxmox Sources (without subscription)
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/proxmox.sources << EOF
|
||||
Types: deb
|
||||
URIs: http://download.proxmox.com/debian/pve
|
||||
Suites: trixie
|
||||
Components: pve-no-subscription
|
||||
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Ceph Squid Sources (without subscription)
|
||||
```bash
|
||||
cat > /etc/apt/sources.list.d/ceph.sources << EOF
|
||||
Types: deb
|
||||
URIs: http://download.proxmox.com/debian/ceph-squid
|
||||
Suites: trixie
|
||||
Components: no-subscription
|
||||
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Remove Old Bookworm Source Lists
|
||||
|
||||
The lists for Debian Bookworm in the old format must be removed:
|
||||
```bash
|
||||
rm -f /etc/apt/sources.list{,.d/*.list}
|
||||
```
|
||||
|
||||
### Update the Configured `apt` Repositories
|
||||
|
||||
Refresh the repositories:
|
||||
```bash
|
||||
apt update
|
||||
```
|
||||
```plaintext
|
||||
Get:1 http://security.debian.org/debian-security trixie-security InRelease [43.4 kB]
|
||||
Get:2 http://deb.debian.org/debian trixie InRelease [140 kB]
|
||||
Get:3 http://download.proxmox.com/debian/ceph-squid trixie InRelease [2,736 B]
|
||||
Get:4 http://download.proxmox.com/debian/pve trixie InRelease [2,771 B]
|
||||
Get:5 http://deb.debian.org/debian trixie-updates InRelease [47.3 kB]
|
||||
Get:6 http://security.debian.org/debian-security trixie-security/main Sources [91.1 kB]
|
||||
Get:7 http://security.debian.org/debian-security trixie-security/non-free-firmware Sources [696 B]
|
||||
Get:8 http://security.debian.org/debian-security trixie-security/main amd64 Packages [69.0 kB]
|
||||
Get:9 http://security.debian.org/debian-security trixie-security/main Translation-en [45.1 kB]
|
||||
Get:10 http://security.debian.org/debian-security trixie-security/non-free-firmware amd64 Packages [544 B]
|
||||
Get:11 http://security.debian.org/debian-security trixie-security/non-free-firmware Translation-en [352 B]
|
||||
Get:12 http://download.proxmox.com/debian/ceph-squid trixie/no-subscription amd64 Packages [33.2 kB]
|
||||
Get:13 http://deb.debian.org/debian trixie/main Sources [10.5 MB]
|
||||
Get:14 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 Packages [241 kB]
|
||||
Get:15 http://deb.debian.org/debian trixie/non-free-firmware Sources [6,536 B]
|
||||
Get:16 http://deb.debian.org/debian trixie/contrib Sources [52.3 kB]
|
||||
Get:17 http://deb.debian.org/debian trixie/main amd64 Packages [9,669 kB]
|
||||
Get:18 http://deb.debian.org/debian trixie/main Translation-en [6,484 kB]
|
||||
Get:19 http://deb.debian.org/debian trixie/contrib amd64 Packages [53.8 kB]
|
||||
Get:20 http://deb.debian.org/debian trixie/contrib Translation-en [49.6 kB]
|
||||
Get:21 http://deb.debian.org/debian trixie/non-free-firmware amd64 Packages [6,868 B]
|
||||
Get:22 http://deb.debian.org/debian trixie/non-free-firmware Translation-en [4,704 B]
|
||||
Get:23 http://deb.debian.org/debian trixie-updates/main Sources [2,788 B]
|
||||
Get:24 http://deb.debian.org/debian trixie-updates/main amd64 Packages [5,412 B]
|
||||
Get:25 http://deb.debian.org/debian trixie-updates/main Translation-en [4,096 B]
|
||||
Fetched 27.6 MB in 3s (8,912 kB/s)
|
||||
Reading package lists... Done
|
||||
Building dependency tree... Done
|
||||
Reading state information... Done
|
||||
666 packages can be upgraded. Run 'apt list --upgradable' to see them.
|
||||
```
|
||||
|
||||
😈 666 packages, I'm doomed!
|
||||
|
||||
### Upgrade to Debian Trixie and Proxmox VE 9
|
||||
|
||||
Launch the upgrade:
|
||||
```bash
|
||||
apt-get dist-upgrade -y
|
||||
```
|
||||
|
||||
During the process , you will be prompted to approve changes to configuration files and some service restarts. You may also be shown the output of changes, you can simply exit there by pressing `q`:
|
||||
- `/etc/issue`: Proxmox VE will auto-generate this file on boot -> `No`
|
||||
- `/etc/lvm/lvm.conf`: Changes relevant for Proxmox VE will be updated -> `Yes`
|
||||
- `/etc/ssh/sshd_config`: Depending your setup -> `Inspect`
|
||||
- `/etc/default/grub`: Only if you changed it manually -> `Inspect`
|
||||
- `/etc/chrony/chrony.conf`: If you did not make extra changes yourself -> `Yes`
|
||||
|
||||
The upgrade took about 5 minutes, depending of the hardware.
|
||||
|
||||
At the end of the upgrade, restart the machine:
|
||||
```bash
|
||||
reboot
|
||||
```
|
||||
### Remove Maintenance Mode
|
||||
|
||||
Finally when the node (hopefully) comes back, you can disable the maintenance mode. The workload which was located on that machine will come back:
|
||||
```bash
|
||||
ha-manager crm-command node-maintenance disable $(hostname)
|
||||
```
|
||||
|
||||
### Post-Upgrade Validation
|
||||
|
||||
- Check cluster communication:
|
||||
```bash
|
||||
pvecm status
|
||||
```
|
||||
|
||||
- Verify storage mounts points
|
||||
|
||||
- Check Ceph cluster health :
|
||||
```bash
|
||||
ceph status
|
||||
```
|
||||
|
||||
- Confirm VM operations, backups, and HA groups
|
||||
|
||||
HA groups have been removed at the profit of HA affinity rules. HA groups are automatically migrated to HA rules.
|
||||
|
||||
- Disable PVE Enterprise repository
|
||||
|
||||
If you don't use the `pve-enterprise` repo, you can disable it:
|
||||
```bash
|
||||
sed -i 's/^/#/' /etc/apt/sources.list.d/pve-enterprise.sources
|
||||
```
|
||||
|
||||
🔁 This node is now upgraded to Proxmox VE 9. Proceed to other nodes.
|
||||
|
||||
## Post Actions
|
||||
|
||||
Once the whole cluster has been upgraded, proceed to post actions:
|
||||
|
||||
- Remove the Ceph cluster `noout` flag:
|
||||
```bash
|
||||
ceph osd unset noout
|
||||
```
|
||||
|
||||
- Recreate PCI passthrough mapping
|
||||
|
||||
For the VM which I removed the host mapping at the beginning of the procedure, I can now recreate the mapping.
|
||||
|
||||
- Add privileges for the Terraform role
|
||||
|
||||
During the check phase, I was advised to remove the privilege `VM.Monitor` from my custom role for Terraform. Now that new privileges have been added with Proxmox VE 9, I can assign them to that role:
|
||||
- VM.GuestAgent.Audit
|
||||
- VM.GuestAgent.FileRead
|
||||
- VM.GuestAgent.FileWrite
|
||||
- VM.GuestAgent.FileSystemMgmt
|
||||
- VM.GuestAgent.Unrestricted
|
||||
|
||||
## Conclusion
|
||||
|
||||
🎉My Proxmox VE cluster is now is version 9!
|
||||
|
||||
The upgrade process was pretty smooth, without any downtime for my resources.
|
||||
|
||||
Now I have access to HA affinity rules, which I was needing for my OPNsense cluster.
|
||||
|
||||
As you could observe, I'm not maintaining my node up to date quite often. I might automate this next time, to keep them updated without any effort.
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,423 @@
|
||||
---
|
||||
slug: migration-opnsense-proxmox-highly-available
|
||||
title: Migration vers mon cluster OPNsense HA dans Proxmox VE
|
||||
description: La démarche détaillée de la migration de ma box OPNsense physique vers un cluster de VM hautement disponible dans Proxmox VE.
|
||||
date: 2025-11-20
|
||||
draft: false
|
||||
tags:
|
||||
- opnsense
|
||||
- high-availability
|
||||
- proxmox
|
||||
categories:
|
||||
- homelab
|
||||
---
|
||||
## Intro
|
||||
|
||||
C'est la dernière étape de mon aventure de virtualisation d'**OPNsense**.
|
||||
|
||||
Il y a quelques mois, ma [box OPNsense physique a crash]({{< ref "post/10-opnsense-crash-disk-panic" >}}) à cause d'une défaillance matérielle. Cela a plongé ma maison dans le noir, littéralement. Pas de réseau, pas de lumières.
|
||||
|
||||
💡 Pour éviter de me retrouver à nouveau dans cette situation, j'ai imaginé un plan pour virtualiser mon pare-feu OPNsense dans mon cluster **Proxmox VE**. La dernière fois, j'avais mis en place un [proof of concept]({{< ref "post/12-opnsense-virtualization-highly-available" >}}) pour valider cette solution : créer un cluster de deux VM **OPNsense** dans Proxmox et rendre le firewall hautement disponible.
|
||||
|
||||
Cette fois, je vais couvrir la création de mon futur cluster OPNsense depuis zéro, planifier la bascule et finalement migrer depuis ma box physique actuelle. C'est parti !
|
||||
|
||||
---
|
||||
## La Configuration VLAN
|
||||
|
||||
Pour mes plans, je dois connecter le WAN, provenant de ma box FAI, à mon switch principal. Pour cela je crée un VLAN dédié pour transporter ce flux jusqu'à mes nœuds Proxmox.
|
||||
|
||||
### UniFi
|
||||
|
||||
D'abord, je configure mon réseau de couche 2 qui est géré par UniFi. Là, je dois créer deux VLANs :
|
||||
|
||||
- _WAN_ (20) : transporte le WAN entre ma box FAI et mes nœuds Proxmox.
|
||||
- _pfSync_ (44), communication entre mes nœuds OPNsense.
|
||||
|
||||
Dans le contrôleur UniFi, dans `Paramètres` > `Réseaux`, j'ajoute un `New Virtual Network`. Je le nomme `WAN` et lui donne l'ID VLAN 20 :
|
||||

|
||||
|
||||
Je fais la même chose pour le VLAN `pfSync` avec l'ID VLAN 44.
|
||||
|
||||
Je prévois de brancher ma box FAI sur le port 15 de mon switch, qui est désactivé pour l'instant. Je l'active, définis le VLAN natif sur le nouveau `WAN (20)` et désactive le trunking :
|
||||

|
||||
|
||||
Une fois ce réglage appliqué, je m'assure que seules les ports où sont connectés mes nœuds Proxmox propagent ces VLANs sur leur trunk.
|
||||
|
||||
J'ai fini la configuration UniFi.
|
||||
|
||||
### Proxmox SDN
|
||||
|
||||
Maintenant que le VLAN peut atteindre mes nœuds, je veux le gérer dans le SDN de Proxmox. J'ai configuré le SDN dans [cet article]({{< ref "post/11-proxmox-cluster-networking-sdn" >}}).
|
||||
|
||||
Dans `Datacenter` > `SDN` > `VNets`, je crée un nouveau VNet, je l'appelle `vlan20` pour suivre ma propre convention de nommage, je lui donne l'alias _WAN_ et j'utilise le tag (ID VLAN) 20 :
|
||||

|
||||
|
||||
Je crée aussi le `vlan44` pour le VLAN _pfSync_, puis j'applique cette configuration et nous avons terminé avec le SDN.
|
||||
|
||||
---
|
||||
## Création des VMs
|
||||
|
||||
Maintenant que la configuration VLAN est faite, je peux commencer à construire les machines virtuelles sur Proxmox.
|
||||
|
||||
La première VM s'appelle `cerbere-head1` (je ne vous l'ai pas dit ? Mon firewall actuel s'appelle `cerbere`, ça a encore plus de sens maintenant !). Voici les réglages :
|
||||
- **Type d'OS** : Linux (même si OPNsense est basé sur FreeBSD)
|
||||
- **Type de machine** : `q35`
|
||||
- **BIOS** : `OVMF (UEFI)`
|
||||
- **Disque** : 20 Go sur stockage Ceph distribué
|
||||
- **RAM** : 4 Go, ballooning désactivé
|
||||
- **CPU** : 2 vCPU
|
||||
- **NICs**, pare-feu désactivé :
|
||||
1. `vmbr0` (_Mgmt_)
|
||||
2. `vlan20` (_WAN_)
|
||||
3. `vlan13` _(User)_
|
||||
4. `vlan37` _(IoT)_
|
||||
5. `vlan44` _(pfSync)_
|
||||
6. `vlan55` _(DMZ)_
|
||||
7. `vlan66` _(Lab)_
|
||||
|
||||

|
||||
|
||||
ℹ️ Maintenant je clone cette VM pour créer `cerbere-head2`, puis je procède à l'installation d'OPNsense. Je ne veux pas entrer trop dans les détails de l'installation d'OPNsense, je l'ai déjà documentée dans le [proof of concept]({{< ref "post/12-opnsense-virtualization-highly-available" >}}).
|
||||
|
||||
Après l'installation des deux instances OPNsense, j'attribue à chacune leur IP sur le réseau _Mgmt_ :
|
||||
- `cerbere-head1` : `192.168.88.2/24`
|
||||
- `cerbere-head2` : `192.168.88.3/24`
|
||||
|
||||
Tant que ces routeurs ne gèrent pas encore les réseaux, je leur donne comme passerelle mon routeur OPNsense actuel (`192.168.88.1`) pour me permettre de les atteindre depuis mon portable dans un autre VLAN.
|
||||
|
||||
---
|
||||
## Configuration d'OPNsense
|
||||
|
||||
Initialement, j'envisageais de restaurer ma configuration OPNsense existante et de l'adapter à l'installation.
|
||||
|
||||
Puis j'ai décidé de repartir de zéro pour documenter et partager la procédure. Cette partie devenant trop longue, j'ai préféré créer un article dédié.
|
||||
|
||||
📖 Vous pouvez trouver les détails de la configuration complète d'OPNsense dans cet [article]({{< ref "post/13-opnsense-full-configuration" >}}), couvrant HA, DNS, DHCP, VPN et reverse proxy.
|
||||
|
||||
---
|
||||
## VM Proxmox Hautement Disponible
|
||||
|
||||
Les ressources (VM ou LXC) dans Proxmox VE peuvent être marquées comme hautement disponibles, voyons comment les configurer.
|
||||
|
||||
### Prérequis pour la HA Proxmox
|
||||
|
||||
D'abord, votre cluster Proxmox doit le permettre. Il y a quelques exigences :
|
||||
|
||||
- Au moins 3 nœuds pour avoir le quorum
|
||||
- Stockage partagé pour vos ressources
|
||||
- Horloge synchronisée
|
||||
- Réseau fiable
|
||||
|
||||
Un mécanisme de fencing doit être activé. Le fencing est le processus d'isoler un nœud de cluster défaillant pour s'assurer qu'il n'accède plus aux ressources partagées. Cela évite les situations de split-brain et permet à Proxmox HA de redémarrer en toute sécurité les VM affectées sur des nœuds sains. Par défaut, il utilise le watchdog logiciel Linux, _softdog_, suffisant pour moi.
|
||||
|
||||
Dans Proxmox VE 8, il était possible de créer des groupes HA, en fonction de leurs ressources, emplacements, etc. Cela a été remplacé, dans Proxmox VE 9, par des règles d'affinité HA. C'est la raison principale derrière la mise à niveau de mon cluster Proxmox VE, que j'ai détaillée dans ce [post]({{< ref "post/14-proxmox-cluster-upgrade-8-to-9-ceph" >}}).
|
||||
|
||||
### Configurer la HA pour les VM
|
||||
|
||||
Le cluster Proxmox est capable de fournir de la HA pour les ressources, mais vous devez définir les règles.
|
||||
|
||||
Dans `Datacenter` > `HA`, vous pouvez voir le statut et gérer les ressources. Dans le panneau `Resources` je clique sur `Add`. Je dois choisir la ressource à configurer en HA dans la liste, ici `cerbere-head1` avec l'ID 122. Puis dans l'infobulle je peux définir le maximum de redémarrages et de relocations, je laisse `Failback` activé et l'état demandé à `started` :
|
||||

|
||||
|
||||
Le cluster Proxmox s'assurera maintenant que cette VM est démarrée. Je fais de même pour l'autre VM OPNsense, `cerbere-head2`.
|
||||
|
||||
### Règles d'Affinité HA
|
||||
|
||||
Super, mais je ne veux pas qu'elles tournent sur le même nœud. C'est là qu'intervient la nouvelle fonctionnalité des règles d'affinité HA de Proxmox VE 9. Proxmox permet de créer des règles d'affinité de nœud et de ressource. Peu m'importe sur quel nœud elles tournent, mais je ne veux pas qu'elles soient ensemble. J'ai besoin d'une règle d'affinité de ressource.
|
||||
|
||||
Dans `Datacenter` > `HA` > `Affinity Rules`, j'ajoute une nouvelle règle d'affinité de ressource HA. Je sélectionne les deux VMs et choisis l'option `Keep Separate` :
|
||||

|
||||
|
||||
✅ Mes VMs OPNsense sont maintenant entièrement prêtes !
|
||||
|
||||
---
|
||||
## Migration
|
||||
|
||||
🚀 Il est temps de rendre cela réel !
|
||||
|
||||
Je ne vais pas mentir, je suis assez excité. Je travaille pour ce moment depuis des jours.
|
||||
|
||||
### Le Plan de Migration
|
||||
|
||||
Ma box OPNsense physique est directement connectée à ma box FAI. Je veux la remplacer par le cluster de VM. (Pour éviter d'écrire le mot OPNsense à chaque ligne, j'appellerai simplement l'ancienne instance "la box" et la nouvelle "la VM" )
|
||||
|
||||
Voici le plan :
|
||||
1. Sauvegarde de la configuration de la box.
|
||||
2. Désactiver le serveur DHCP sur la box.
|
||||
3. Changer les adresses IP de la box.
|
||||
4. Changer les VIP sur la VM.
|
||||
5. Désactiver la passerelle sur la VM.
|
||||
6. Configurer le DHCP sur les deux VMs.
|
||||
7. Activer le répéteur mDNS sur la VM.
|
||||
8. Répliquer les services sur la VM.
|
||||
9. Déplacement du câble Ethernet.
|
||||
|
||||
### Stratégie de Retour Arrière
|
||||
|
||||
Aucune. 😎
|
||||
|
||||
Je plaisante, le retour arrière consiste à restaurer la configuration de la box, arrêter les VMs OPNsense et rebrancher le câble Ethernet dans la box.
|
||||
|
||||
### Plan de vérification
|
||||
|
||||
Pour valider la migration, je dresse une checklist :
|
||||
1. Bail DHCP WAN dans la VM.
|
||||
2. Ping depuis mon PC vers le VIP du VLAN User.
|
||||
3. Ping entre les VLANs.
|
||||
4. SSH vers mes machines.
|
||||
5. Renouveler le bail DHCP.
|
||||
6. Vérifier `ipconfig`
|
||||
7. Tester l'accès à des sites internet.
|
||||
8. Vérifier les logs du pare-feu.
|
||||
9. Vérifier mes services web.
|
||||
10. Vérifier que mes services internes ne sont pas accessibles depuis l'extérieur.
|
||||
11. Tester le VPN.
|
||||
12. Vérifier tous les appareils IoT.
|
||||
13. Vérifier les fonctionnalités Home Assistant.
|
||||
14. Vérifier que la TV fonctionne.
|
||||
15. Tester le Chromecast.
|
||||
16. Imprimer quelque chose.
|
||||
17. Vérifier la blocklist DNS.
|
||||
18. Speedtest.
|
||||
19. Bascule.
|
||||
20. Failover.
|
||||
21. Reprise après sinistre.
|
||||
22. Champagne !
|
||||
|
||||
Est-ce que ça va marcher ? On verra bien !
|
||||
|
||||
### Étapes de Migration
|
||||
|
||||
1. **Sauvegarde de la configuration de la box.**
|
||||
|
||||
Sur mon instance OPNsense physique, dans `System` > `Configuration` > `Backups`, je clique sur le bouton `Download configuration` qui me donne le précieux fichier XML. Celui qui m'a sauvé la mise la [dernière fois]({{< ref "post/10-opnsense-crash-disk-panic" >}}).
|
||||
|
||||
2. **Désactiver le serveur DHCP sur la box.**
|
||||
|
||||
Dans `Services` > `ISC DHCPv4`, et pour toutes mes interfaces, je désactive le serveur DHCP. Je ne fournis que du DHCPv4 dans mon réseau.
|
||||
|
||||
3. **Changer les adresses IP de la box.**
|
||||
|
||||
Dans `Interfaces`, et pour toutes mes interfaces, je modifie l'IP du firewall, de `.1` à `.253`. Je veux réutiliser la même adresse IP comme VIP, et garder cette instance encore joignable si besoin.
|
||||
|
||||
Dès que je clique sur `Apply`, je perds la communication, ce qui est attendu.
|
||||
|
||||
4. **Changer les VIP sur la VM.**
|
||||
|
||||
Sur ma VM maître, dans `Interfaces` > `Virtual IPs` > `Settings`, je change l'adresse VIP pour chaque interface et la mets en `.1`.
|
||||
|
||||
5. **Désactiver la passerelle sur la VM.**
|
||||
|
||||
Dans `System` > `Gateways` > `Configuration`, je désactive `LAN_GW` qui n'est plus nécessaire.
|
||||
|
||||
6. **Configurer le DHCP sur les deux VMs.**
|
||||
|
||||
Sur les deux VMs, dans `Services` > `Dnsmasq DNS & DHCP`, j'active le service sur mes 5 interfaces.
|
||||
|
||||
7. **Activer le répéteur mDNS sur la VM.**
|
||||
|
||||
Dans `Services` > `mDNS Repeater`, j'active le service et j'active aussi le `CARP Failover`.
|
||||
|
||||
Le service ne démarre pas. Je verrai ce problème plus tard.
|
||||
|
||||
8. **Répliquer les services sur la VM.**
|
||||
|
||||
Dans `Système` > `High Availability` > `Status`, je clique sur le bouton `Synchronize and reconfigure all`.
|
||||
|
||||
9. **Déplacement du câble Ethernet.**
|
||||
|
||||
Physiquement dans mon rack, je débranche le câble Ethernet du port WAN (`igc0`) de ma box OPNsense physique et je le branche sur le port 15 de mon switch UniFi.
|
||||
|
||||
---
|
||||
## Vérification
|
||||
|
||||
😮💨 Je prends une grande inspiration et commence la phase de vérification.
|
||||
|
||||
### Checklist
|
||||
|
||||
- ✅ Bail DHCP WAN dans la VM.
|
||||
- ✅ Ping depuis mon PC vers le VIP du VLAN User.
|
||||
- ⚠️ Ping entre VLANs.
|
||||
Les pings fonctionnent, mais j'observe quelques pertes, environ 10 %.
|
||||
- ✅ SSH vers mes machines.
|
||||
- ✅ Renouvellement du bail DHCP.
|
||||
- ✅ Vérifier `ipconfig`
|
||||
- ❌ Tester un site internet. → ✅
|
||||
Quelques sites fonctionnent, tout est incroyablement lent... Ça doit être le DNS. J'essaie de résoudre un domaine au hasard, ça marche. Mais je ne peux pas résoudre `google.com`. Je redémarre le service Unbound DNS, tout fonctionne maintenant. C'est toujours le DNS...
|
||||
- ⚠️ Vérifier les logs du pare-feu.
|
||||
Quelques flux sont bloqués, pas critique.
|
||||
- ✅ Vérifier mes services web.
|
||||
- ✅ Vérifier que mes services internes ne sont pas accessibles depuis l'extérieur.
|
||||
- ✅ Tester le VPN.
|
||||
- ✅ Vérifier tous les appareils IoT.
|
||||
- ✅ Vérifier les fonctionnalités Home Assistant.
|
||||
- ✅ Vérifier que la TV fonctionne.
|
||||
- ❌ Tester le Chromecast.
|
||||
C'est lié au service mDNS qui ne parvient pas à démarrer. Je peux le démarrer si je décoche l'option `CARP Failover`. Le Chromecast est visible maintenant. → ⚠️
|
||||
- ✅ Imprimer quelque chose.
|
||||
- ✅ Vérifier la blocklist DNS.
|
||||
- ✅ Speedtest.
|
||||
J'observe environ 15 % de diminution de bande passante (de 940Mbps à 825Mbps).
|
||||
- ❌ Bascule.
|
||||
La bascule fonctionne difficilement, beaucoup de paquets perdus pendant la bascule. Le service rendu n'est pas génial : plus d'accès internet et mes services web sont inaccessibles.
|
||||
- ⌛ Failover.
|
||||
- ⌛ Reprise après sinistre.
|
||||
À tester plus tard.
|
||||
|
||||
📝 Bon, les résultats sont plutôt bons, pas parfaits, mais satisfaisants !
|
||||
### Résolution des Problèmes
|
||||
|
||||
Je me concentre sur la résolution des problèmes restants rencontrés lors des tests.
|
||||
|
||||
1. **DNS**
|
||||
|
||||
Lors de la bascule, la connexion internet ne fonctionne pas. Pas de DNS, c'est toujours le DNS.
|
||||
|
||||
C'est parce que le nœud de secours n'a pas de passerelle lorsqu'il est en mode passif. L'absence de passerelle empêche le DNS de résoudre. Après la bascule, il conserve des domaines non résolus dans son cache. Ce problème conduit aussi à un autre souci : quand il est passif, je ne peux pas mettre à jour le système.
|
||||
|
||||
**Solution** : Définir une passerelle pointant vers l'autre nœud, avec un numéro de priorité plus élevé que la passerelle WAN (un numéro plus élevé signifie une priorité plus basse). Ainsi, cette passerelle n'est pas active tant que le nœud est maître.
|
||||
|
||||
2. **Reverse Proxy**
|
||||
|
||||
Lors de la bascule, tous les services web que j'héberge (reverse proxy/proxy couche 4) renvoient cette erreur : `SSL_ERROR_INTERNAL_ERROR_ALERT`. Après vérification des services synchronisés via XMLRPC Sync, Caddy et mDNS repeater n'étaient pas sélectionnés. C'est parce que ces services ont été installés après la configuration initiale du HA.
|
||||
|
||||
**Solution** : Ajouter Caddy à XMLRPC Sync.
|
||||
|
||||
3. **Pertes de paquets**
|
||||
|
||||
J'observe environ 10 % de pertes de paquets pour les pings depuis n'importe quel VLAN vers le VLAN _Mgmt_. Je n'ai pas ce problème pour les autres VLANs.
|
||||
|
||||
Le VLAN _Mgmt_ est le VLAN natif dans mon réseau, cela pourrait être la raison de ce problème. C'est le seul réseau non défini dans le SDN Proxmox. Je ne veux pas avoir à tagger ce VLAN.
|
||||
|
||||
**Solution** : Désactiver le pare-feu Proxmox de cette interface pour la VM. En réalité, je les ai tous désactivés et mis à jour la documentation ci-dessus. Je ne sais pas exactement pourquoi cela causait ce type de problème, mais la désactivation a résolu mon souci (j'ai pu reproduire le comportement en réactivant le pare-feu).
|
||||
|
||||
4. **Script CARP**
|
||||
|
||||
Lors de la bascule, le script d'événement CARP est déclenché autant de fois qu'il y a d'interfaces. J'ai 5 IPs virtuelles, le script reconfigure mon interface WAN 5 fois.
|
||||
|
||||
**Solution** : Retravailler le script pour récupérer l'état de l'interface WAN et ne reconfigurer l'interface que lorsque c'est nécessaire :
|
||||
```php
|
||||
#!/usr/local/bin/php
|
||||
<?php
|
||||
/**
|
||||
* OPNsense CARP event script
|
||||
* - Enables/disables the WAN interface only when needed
|
||||
* - Avoids reapplying config when CARP triggers multiple times
|
||||
*/
|
||||
|
||||
require_once("config.inc");
|
||||
require_once("interfaces.inc");
|
||||
require_once("util.inc");
|
||||
require_once("system.inc");
|
||||
|
||||
// Read CARP event arguments
|
||||
$subsystem = !empty($argv[1]) ? $argv[1] : '';
|
||||
$type = !empty($argv[2]) ? $argv[2] : '';
|
||||
|
||||
// Accept only MASTER/BACKUP events
|
||||
if (!in_array($type, ['MASTER', 'BACKUP'])) {
|
||||
// Ignore CARP INIT, DEMOTED, etc.
|
||||
exit(0);
|
||||
}
|
||||
|
||||
// Validate subsystem name format, expected pattern: <ifname>@<vhid>
|
||||
if (!preg_match('/^[a-z0-9_]+@\S+$/i', $subsystem)) {
|
||||
log_error("Malformed subsystem argument: '{$subsystem}'.");
|
||||
exit(0);
|
||||
}
|
||||
|
||||
// Interface key to manage
|
||||
$ifkey = 'wan';
|
||||
// Determine whether WAN interface is currently enabled
|
||||
$ifkey_enabled = !empty($config['interfaces'][$ifkey]['enable']) ? true : false;
|
||||
|
||||
// MASTER event
|
||||
if ($type === "MASTER") {
|
||||
// Enable WAN only if it's currently disabled
|
||||
if (!$ifkey_enabled) {
|
||||
log_msg("CARP event: switching to '$type', enabling interface '$ifkey'.", LOG_WARNING);
|
||||
$config['interfaces'][$ifkey]['enable'] = '1';
|
||||
write_config("enable interface '$ifkey' due CARP event '$type'", false);
|
||||
interface_configure(false, $ifkey, false, false);
|
||||
} else {
|
||||
log_msg("CARP event: already '$type' for interface '$ifkey', nothing to do.");
|
||||
}
|
||||
|
||||
// BACKUP event
|
||||
} else {
|
||||
// Disable WAN only if it's currently enabled
|
||||
if ($ifkey_enabled) {
|
||||
log_msg("CARP event: switching to '$type', disabling interface '$ifkey'.", LOG_WARNING);
|
||||
unset($config['interfaces'][$ifkey]['enable']);
|
||||
write_config("disable interface '$ifkey' due CARP event '$type'", false);
|
||||
interface_configure(false, $ifkey, false, false);
|
||||
} else {
|
||||
log_msg("CARP event: already '$type' for interface '$ifkey', nothing to do.");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
5. **mDNS Repeater**
|
||||
|
||||
Le répéteur mDNS ne veut pas démarrer quand je sélectionne l'option `CARP Failover`.
|
||||
|
||||
**Solution** : La machine nécessite un redémarrage pour démarrer ce service compatible CARP.
|
||||
|
||||
6. **Adresse IPv6**
|
||||
|
||||
Mon nœud `cerbere-head1` crie dans le fichier de logs tandis que l'autre ne le fait pas. Voici les messages affichés chaque seconde quand il est maître :
|
||||
```plaintext
|
||||
Warning rtsold <interface_up> vtnet1 is disabled. in the logs (OPNsense)
|
||||
```
|
||||
|
||||
Un autre message que j'ai plusieurs fois après un switchback :
|
||||
```plaintext
|
||||
Error dhcp6c transmit failed: Can't assign requested address
|
||||
```
|
||||
|
||||
Ceci est lié à IPv6. J'observe que mon nœud principal n'a pas d'adresse IPv6 globale, seulement une link-local. De plus, il n'a pas de passerelle IPv6. Mon nœud secondaire, en revanche, a à la fois l'adresse globale et la passerelle.
|
||||
|
||||
Je ne suis pas expert IPv6, après quelques heures de recherche, j'abandonne IPv6. Si quelqu'un peut m'aider, ce serait vraiment apprécié !
|
||||
|
||||
**Contournement** : Supprimer DHCPv6 pour mon interface WAN.
|
||||
|
||||
### Confirmation
|
||||
|
||||
Maintenant que tout est corrigé, je peux évaluer les performances du failover.
|
||||
|
||||
1. **Basculement**
|
||||
|
||||
En entrant manuellement en mode maintenance CARP depuis l'interface WebGUI, aucune perte de paquets n'est observée. Impressionnant.
|
||||
|
||||
2. **Failover**
|
||||
|
||||
Pour simuler un failover, je tue la VM OPNsense active. Ici j'observe une seule perte de paquet. Génial.
|
||||
|
||||

|
||||
|
||||
3. **Reprise après sinistre**
|
||||
|
||||
Une reprise après sinistre est ce qui se produirait après un arrêt complet d'un cluster Proxmox, suite à une coupure de courant par exemple. Je n'ai pas eu le temps (ni le courage) de m'en occuper, je préfère mieux me préparer pour éviter les dommages collatéraux. Mais il est certain que ce genre de scénario doit être évalué.
|
||||
|
||||
### Avantages Supplémentaires
|
||||
|
||||
Outre le fait que cette nouvelle configuration est plus résiliente, j'ai constaté quelques autres avantages.
|
||||
|
||||
Mon rack est minuscule et l'espace est restreint. L'ensemble chauffe beaucoup, dépassant les 40 °C au sommet du rack en été. Réduire le nombre de machines allumées a permis de faire baisser la température. J'ai gagné 1,5 °C après avoir éteint l'ancien boîtier OPNsense, c'est super !
|
||||
|
||||
La consommation électrique est également un point important, mon petit datacenter consommait en moyenne 85 W. Là encore, j'ai constaté une légère baisse, d'environ 8 W. Sachant que le système fonctionne 24/7, ce n'est pas négligeable.
|
||||
|
||||
Enfin, j'ai également retiré le boîtier lui-même et le câble d'alimentation. Les places sont très limitées, ce qui est un autre point positif.
|
||||
|
||||
---
|
||||
## Conclusion
|
||||
|
||||
🎉 J'ai réussi les gars ! Je suis très fier du résultat, et fier de moi.
|
||||
|
||||
De mon [premier crash de ma box OPNsense]({{< ref "post/10-opnsense-crash-disk-panic" >}}), à la recherche d'une solution, en passant par la [proof of concept]({{< ref "post/12-opnsense-virtualization-highly-available" >}}) de haute disponibilité, jusqu'à cette migration, ce fut un projet assez long, mais extrêmement intéressant.
|
||||
|
||||
🎯 Se fixer des objectifs, c'est bien, mais les atteindre, c'est encore mieux.
|
||||
|
||||
Je vais maintenant mettre OPNsense de côté un petit moment pour me recentrer sur mon apprentissage de Kubernetes !
|
||||
|
||||
Comme toujours, si vous avez des questions, des remarques ou une solution à mon problème d'IPv6, je serai ravi de vous aider.
|
||||
420
content/post/15-migration-opnsense-proxmox-highly-available.md
Normal file
@@ -0,0 +1,420 @@
|
||||
---
|
||||
slug: migration-opnsense-proxmox-highly-available
|
||||
title: Migration to my OPNsense HA Cluster in Proxmox VE
|
||||
description: The detailed steps of the migration from my OPNsense physical box to a highly available cluster of VM in Proxmox VE.
|
||||
date: 2025-11-20
|
||||
draft: false
|
||||
tags:
|
||||
- opnsense
|
||||
- high-availability
|
||||
- proxmox
|
||||
categories:
|
||||
- homelab
|
||||
---
|
||||
## Intro
|
||||
|
||||
This is the final stage of my **OPNsense** virtualization journey.
|
||||
|
||||
A few months ago, my physical [OPNsense box crashed]({{< ref "post/10-opnsense-crash-disk-panic" >}}) because of a hardware failure. This leads my home in the dark, literally. No network, no lights.
|
||||
|
||||
💡 To avoid being in that situation again, I imagined a plan to virtualize my OPNsense firewall into my **Proxmox VE** cluster. The last time, I've set up a [proof of concept]({{< ref "post/12-opnsense-virtualization-highly-available" >}}) to validate this solution: create a cluster of two **OPNsense** VMs in Proxmox and make the firewall highly available.
|
||||
|
||||
This time, I will cover the creation of my future OPNsense cluster from scratch, plan the cut over and finally migrate from my current physical box. Let's go!
|
||||
|
||||
---
|
||||
## The VLAN Configuration
|
||||
|
||||
For my plans, I'll have to connect the WAN, coming from my ISP box, to my main switch. For that I create a dedicated VLAN to transport this flow to my Proxmox nodes.
|
||||
|
||||
### UniFi
|
||||
|
||||
First, I configure my layer 2 network which is managed by UniFi. There I need to create two VLANs:
|
||||
- *WAN* (20): transport the WAN between my ISP box and my Proxmox nodes.
|
||||
- *pfSync* (44), communication between my OPNsense nodes.
|
||||
|
||||
In the UniFi controller, in `Settings` > `Networks`, I add a `New Virtual Network`. I name it `WAN` and give it the VLAN ID 20:
|
||||

|
||||
|
||||
I do the same thing again for the `pfSync` VLAN with the VLAN ID 44.
|
||||
|
||||
I plan to plug my ISP box on the port 15 of my switch, which is disabled for now. I set it as active, set the native VLAN on the newly created one `WAN (20)` and disable trunking:
|
||||

|
||||
|
||||
Once this setting applied, I make sure that only the ports where are connected my Proxmox nodes propagate these VLAN on their trunk.
|
||||
|
||||
I'm done with UniFi configuration.
|
||||
|
||||
### Proxmox SDN
|
||||
|
||||
Now that the VLAN can reach my nodes, I want to handle it in the Proxmox SDN. I've configured the SDN in [that article]({{< ref "post/11-proxmox-cluster-networking-sdn" >}}).
|
||||
|
||||
In `Datacenter` > `SDN` > `VNets`, I create a new VNet, call it `vlan20` to follow my own naming convention, give it the *WAN* alias and use the tag (VLAN ID) 20:
|
||||

|
||||
|
||||
I also create the `vlan44` for the *pfSync* VLAN, then I apply this configuration and we are done with the SDN.
|
||||
|
||||
---
|
||||
## Create the VMs
|
||||
|
||||
Now that the VLAN configuration is done, I can start buiding the virtual machines on Proxmox.
|
||||
|
||||
The first VM is named `cerbere-head1` (I didn't tell you? My current firewall is named `cerbere`, it makes even more sense now!). Here are the settings:
|
||||
- **OS type**: Linux (even if OPNsense is based on FreeBSD)
|
||||
- **Machine type**: `q35`
|
||||
- **BIOS**: `OVMF (UEFI)`
|
||||
- **Disk**: 20 GB on Ceph distributed storage
|
||||
- **RAM**: 4 GB RAM, ballooning disabled
|
||||
- **CPU**: 2 vCPU
|
||||
- **NICs**, firewall disabled:
|
||||
1. `vmbr0` (*Mgmt*)
|
||||
2. `vlan20` (*WAN*)
|
||||
3. `vlan13` *(User)*
|
||||
4. `vlan37` *(IoT)*
|
||||
5. `vlan44` *(pfSync)*
|
||||
6. `vlan55` *(DMZ)*
|
||||
7. `vlan66` *(Lab)*
|
||||
|
||||

|
||||
|
||||
ℹ️ Now I clone that VM to create `cerbere-head2`, then I proceed with OPNsense installation. I don't want to go into much details about OPNsense installation, I already documented it in the [proof of concept]({{< ref "post/12-opnsense-virtualization-highly-available" >}}).
|
||||
|
||||
After the installation of both OPNsense instances, I give to each of them their IP in the *Mgmt* network:
|
||||
- `cerbere-head1`: `192.168.88.2/24`
|
||||
- `cerbere-head2`: `192.168.88.3/24`
|
||||
|
||||
I give them the other OPNsense node as gateway (`192.168.88.1`) to allow me to reach them from my laptop in another VLAN.
|
||||
|
||||
---
|
||||
## Configure OPNsense
|
||||
|
||||
Initially, I considered restoring my existing OPNsense configuration and adapt it to the setup.
|
||||
|
||||
Then I decided to start over to document and share it. This part was getting so long that I prefered create a dedicated post instead.
|
||||
|
||||
📖 You can find the details of the full OPNsense configuration in that [article]({{< ref "post/13-opnsense-full-configuration" >}}), covering HA, DNS, DHCP, VPN and reverse proxy.
|
||||
|
||||
---
|
||||
## Proxmox VM High Availability
|
||||
|
||||
Resources (VM or LXC) in Proxmox VE can be tagged as highly available, let see how to set it up.
|
||||
|
||||
### Proxmox HA Requirements
|
||||
|
||||
First, your Proxmox cluster must allow it. There are some requirements:
|
||||
- At least 3 nodes to have quorum
|
||||
- Shared storage for your resources
|
||||
- Time synchronized
|
||||
- Reliable network
|
||||
|
||||
A fencing mechanism must be enabled. Fencing is the process of isolating a failed cluster node to ensure it no longer accesses shared resources. This prevents split-brain situations and allows Proxmox HA to safely restart affected VMs on healthy nodes. By default, it is using Linux software watchdog, *softdog*, good enough for me.
|
||||
|
||||
In Proxmox VE 8, It was possible to create HA groups, depending of their resources, locations, etc. This has been replaced, in Proxmox VE 9, by HA affinity rules. This is actually the main reason behind my Proxmox VE cluster upgrade, which I've detailed in that [post]({{< ref "post/14-proxmox-cluster-upgrade-8-to-9-ceph" >}}).
|
||||
|
||||
### Configure VM HA
|
||||
|
||||
The Proxmox cluster is able to provide HA for the resources, but you need to define the rules.
|
||||
|
||||
In `Datacenter` > `HA`, you can see the status and manage the resources. In the `Resources` panel I click on `Add`. I need to pick the resource to configure as HA in the list, here `cerbere-head1` with ID 122. Then in the tooltip I can define the maximum of restart and relocate, I keep `Failback` enabled and the requested state to `started`:
|
||||

|
||||
|
||||
The Proxmox cluster will now make sure this VM is started. I do the same for the other OPNsense VM, `cerbere-head2`.
|
||||
|
||||
### HA Affinity Rules
|
||||
|
||||
Great, but I don't want them on the same node. This is when the new feature HA affinity rules, of Proxmox VE 9, come in. Proxmox allows to create node affinity and resource affinity rules. I don't mind on which node they run, but I don't want them together. I need a resource affinity rule.
|
||||
|
||||
In `Datacenter` > `HA` > `Affinity Rules`, I add a new HA resource affinity rule. I select both VMs and pick the option `Keep Separate`:
|
||||

|
||||
|
||||
✅ My OPNsense VMs are now fully ready!
|
||||
|
||||
---
|
||||
## Migration
|
||||
|
||||
🚀 Time to make it real!
|
||||
|
||||
I'm not gonna lie, I'm quite excited. I'm working for this moment for days.
|
||||
|
||||
### The Migration Plan
|
||||
|
||||
I have my physical OPNsense box directly connected to my ISP box. I want to swap it for the VM cluster. (To avoid writing the word OPNsense on each line, I'll simply name it "the box" and "the VM")
|
||||
|
||||
Here is the plan:
|
||||
1. Backup of the box configuration.
|
||||
2. Disable DHCP server on the box.
|
||||
3. Change IP addresses of the box.
|
||||
4. Change VIP on the VM.
|
||||
5. Disable gateway on VM.
|
||||
6. Configure DHCP on both VMs.
|
||||
7. Enable mDNS repeater on VM.
|
||||
8. Replicate services on VM.
|
||||
9. Move of the Ethernet cable.
|
||||
### Rollback Strategy
|
||||
|
||||
None. 😎
|
||||
|
||||
I'm kidding, the rollback consists of restoring the box configuration, shutdown the OPNsense VMs and plug back the Ethernet cable into the box.
|
||||
|
||||
### Verification Plan
|
||||
|
||||
To validate the migration, I'm drawing up a checklist:
|
||||
1. WAN DHCP lease in the VM.
|
||||
2. Ping from my PC to the VIP of the User VLAN.
|
||||
3. Ping cross VLAN.
|
||||
4. SSH into my machines.
|
||||
5. Renew DHCP lease.
|
||||
6. Check `ipconfig`
|
||||
7. Test internet website.
|
||||
8. Check firewall logs.
|
||||
9. Check my webservices.
|
||||
10. Verify if my internal webservices are not accessible from outside.
|
||||
11. Test VPN.
|
||||
12. Check all IoT devices.
|
||||
13. Check Home Assistant features.
|
||||
14. Check if the TV works.
|
||||
15. Test the Chromecast.
|
||||
16. Print something.
|
||||
17. Verify DNS blocklist.
|
||||
18. Speedtest.
|
||||
19. Switchover.
|
||||
20. Failover.
|
||||
21. Disaster Recovery.
|
||||
22. Champaign!
|
||||
|
||||
Will it work? Let's find out!
|
||||
|
||||
### Migration Steps
|
||||
|
||||
1. **Backup of the box configuration.**
|
||||
|
||||
On my physical OPNsense instance, in `System` > `Configuration` > `Backups`, I click the `Download configuration` button which give me the precious XML file. The one that saved my ass the [last time]({{< ref "post/10-opnsense-crash-disk-panic" >}}).
|
||||
|
||||
2. **Disable DHCP server on the box.**
|
||||
|
||||
In `Services` > `ISC DHCPv4`, and for all my interfaces, I disable the DHCP server. I only serve DHCPv4 in my network.
|
||||
|
||||
3. **Change IP addresses of the box.**
|
||||
|
||||
In `Interfaces`, and for all my interfaces, I modify the IP of the firewall, from `.1` to `.253`. I want to reuse the same IP address as VIP, and have this instance still reachable if needed.
|
||||
|
||||
As soon as I click on `Apply`, I lost the communication, which is expected.
|
||||
|
||||
4. **Change VIP on the VM.**
|
||||
|
||||
On my master VM, In `Interfaces` > `Virtual IPs` > `Settings`, I change the VIP address for each interface and set it to `.1`.
|
||||
|
||||
5. **Disable gateway on VM.**
|
||||
|
||||
In `System` > `Gateways` > `Configuration`, I disable the `LAN_GW` which is not needed anymore.
|
||||
|
||||
6. **Configure DHCP on both VMs.**
|
||||
|
||||
In both VM, in `Services` > `Dnsmasq DNS & DHCP`, I enable the service on my 5 interfaces.
|
||||
|
||||
7. **Enable mDNS repeater on VM.**
|
||||
|
||||
In `Services` > `mDNS Repeater`, I enable the service and also enable CARP Failover.
|
||||
|
||||
The service does not start. I'll see that problem later.
|
||||
|
||||
8. **Replicate services on VM.**
|
||||
|
||||
In `System` > `High Availability` > `Status`, I click the button to `Synchronize and reconfigure all`.
|
||||
|
||||
9. **Move of the Ethernet cable.**
|
||||
|
||||
Physically in my rack, I unplug the Ethernet cable from the WAN port (`igc0`) of my physical OPNsense box and plug it into the port 15 of my UniFi switch.
|
||||
|
||||
---
|
||||
## Verification
|
||||
|
||||
😮💨 I take a deep breath and start the verification phase.
|
||||
|
||||
### Checklist
|
||||
|
||||
- ✅ WAN DHCP lease in the VM.
|
||||
- ✅ Ping from my PC to the VIP of the User VLAN.
|
||||
- ⚠️ Ping cross VLAN.
|
||||
Pings are working, but I observe some drops, about 10%.
|
||||
- ✅ SSH into my machines.
|
||||
- ✅ Renew DHCP lease.
|
||||
- ✅ Check `ipconfig`
|
||||
- ❌ Test internet website. → ✅
|
||||
A few websites are working, everything is incredibly slow... It must be the DNS. I try to lookup a random domain, it is working. But I can't lookup `google.com`. I restart the Unbound DNS service, everything works now. It is always the DNS...
|
||||
- ⚠️ Check firewall logs.
|
||||
Few flows are blocks, not mandatory.
|
||||
- ✅Check my webservices.
|
||||
- ✅Verify if my internal webservices are not accessible from outside.
|
||||
- ✅ Test VPN.
|
||||
- ✅ Check all IoT devices.
|
||||
- ✅ Check Home Assistant features.
|
||||
- ✅Check if the TV works.
|
||||
- ❌ Test the Chromecast.
|
||||
It is related to the mDNS service not able to start. I can start it if I uncheck the `CARP Failover` option. the Chromecast is visible now. → ⚠️
|
||||
- ✅Print something.
|
||||
- ✅Verify DNS blocklist.
|
||||
- ✅Speedtest.
|
||||
I observe roughly 15% of decrease bandwidth (from 940Mbps to 825Mbps).
|
||||
- ❌ Switchover.
|
||||
The switchover barely works, a lot of dropped packets during the switch. The service provided is not great: no more internet and my webservices are not reachable.
|
||||
- ⌛ Failover.
|
||||
- ⌛ Disaster Recovery.
|
||||
To be tested later.
|
||||
|
||||
📝 Well, the results are pretty good, not perfect, but satisfying!
|
||||
### Problem Solving
|
||||
|
||||
I focus on resolving remaining problems experienced during the tests.
|
||||
|
||||
1. **DNS**
|
||||
|
||||
During the switchover, the internet connection is not working. No DNS, it is always DNS.
|
||||
|
||||
It's because the backup node does not have a gateway while passive. No gateway prevents the DNS to resolve. After the switchover, it still has unresolved domains in its cache. This problem also lead to another issue, while passive, I can't update the system.
|
||||
|
||||
**Solution**: Create a gateway pointing to the other node, with a higher priority number than the WAN gateway (higher number means lower priority). This way, that gateway is not active while the node is master.
|
||||
|
||||
2. **Reverse Proxy**
|
||||
|
||||
During the switchover, every webservices which I host (reverse proxy/layer 4 proxy) give this error: `SSL_ERROR_INTERNAL_ERROR_ALERT`. After checking the services synchronized throught XMLRPC Sync, Caddy and mDNS repeater were not selected. It is because these services were installed after the initial configuration of the HA.
|
||||
|
||||
**Solution**: Add Caddy to XMLRPC Sync.
|
||||
|
||||
3. **Packet Drops**
|
||||
|
||||
I observe about 10% packet drops for pings from any VLAN to the *Mgmt* VLAN. I don't have this problem for the other VLANs.
|
||||
|
||||
The *Mgmt* VLAN is the native one in my network, it might be the reason behind this issue. This is the only network not defined in the Proxmox SDN. I don't want to have to tag this VLAN.
|
||||
|
||||
**Solution**: Disable the Proxmox firewall of this interface for the VM. I actually disable them all and update the documentation above. I'm not sure why this cause that kind of problem, but disabling it fixed my issue (I could reproduce the behavior while activating the firewall again).
|
||||
|
||||
4. **CARP Script**
|
||||
|
||||
During a switchover, the CARP event script is triggered as many times as the number of interfaces. I have 5 virtual IPs, the script reconfigure my WAN interface 5 times.
|
||||
|
||||
**Solution**: Rework the script to get the WAN interface state and only reconfigure the inteface when needed:
|
||||
```php
|
||||
#!/usr/local/bin/php
|
||||
<?php
|
||||
/**
|
||||
* OPNsense CARP event script
|
||||
* - Enables/disables the WAN interface only when needed
|
||||
* - Avoids reapplying config when CARP triggers multiple times
|
||||
*/
|
||||
|
||||
require_once("config.inc");
|
||||
require_once("interfaces.inc");
|
||||
require_once("util.inc");
|
||||
require_once("system.inc");
|
||||
|
||||
// Read CARP event arguments
|
||||
$subsystem = !empty($argv[1]) ? $argv[1] : '';
|
||||
$type = !empty($argv[2]) ? $argv[2] : '';
|
||||
|
||||
// Accept only MASTER/BACKUP events
|
||||
if (!in_array($type, ['MASTER', 'BACKUP'])) {
|
||||
// Ignore CARP INIT, DEMOTED, etc.
|
||||
exit(0);
|
||||
}
|
||||
|
||||
// Validate subsystem name format, expected pattern: <ifname>@<vhid>
|
||||
if (!preg_match('/^[a-z0-9_]+@\S+$/i', $subsystem)) {
|
||||
log_error("Malformed subsystem argument: '{$subsystem}'.");
|
||||
exit(0);
|
||||
}
|
||||
|
||||
// Interface key to manage
|
||||
$ifkey = 'wan';
|
||||
// Determine whether WAN interface is currently enabled
|
||||
$ifkey_enabled = !empty($config['interfaces'][$ifkey]['enable']) ? true : false;
|
||||
|
||||
// MASTER event
|
||||
if ($type === "MASTER") {
|
||||
// Enable WAN only if it's currently disabled
|
||||
if (!$ifkey_enabled) {
|
||||
log_msg("CARP event: switching to '$type', enabling interface '$ifkey'.", LOG_WARNING);
|
||||
$config['interfaces'][$ifkey]['enable'] = '1';
|
||||
write_config("enable interface '$ifkey' due CARP event '$type'", false);
|
||||
interface_configure(false, $ifkey, false, false);
|
||||
} else {
|
||||
log_msg("CARP event: already '$type' for interface '$ifkey', nothing to do.");
|
||||
}
|
||||
|
||||
// BACKUP event
|
||||
} else {
|
||||
// Disable WAN only if it's currently enabled
|
||||
if ($ifkey_enabled) {
|
||||
log_msg("CARP event: switching to '$type', disabling interface '$ifkey'.", LOG_WARNING);
|
||||
unset($config['interfaces'][$ifkey]['enable']);
|
||||
write_config("disable interface '$ifkey' due CARP event '$type'", false);
|
||||
interface_configure(false, $ifkey, false, false);
|
||||
} else {
|
||||
log_msg("CARP event: already '$type' for interface '$ifkey', nothing to do.");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
5. **mDNS Repeater**
|
||||
|
||||
The mDNS repeater does not want to start when I select the option for `CARP Failover`.
|
||||
|
||||
**Solution**: The machine requires a reboot to start this service CARP aware.
|
||||
|
||||
6. **IPv6 Address**
|
||||
|
||||
My `cerbere-head1` node is crying in the log file while the other does not. Here are the messages spit every seconds while it is master:
|
||||
```plaintext
|
||||
Warning rtsold <interface_up> vtnet1 is disabled. in the logs (OPNsense)
|
||||
```
|
||||
|
||||
Another one I'm having several times after a switchback:
|
||||
```plaintext
|
||||
Error dhcp6c transmit failed: Can't assign requested address
|
||||
```
|
||||
|
||||
This is related to IPv6. I observe that my main node does not have a global IPv6 address, only a link-local. Also, it does not have a IPv6 gateway. My secondary node, in the other hand, has both addresses and the gateway.
|
||||
|
||||
I'm no IPv6 expert, after searching for a couple of hours, I give up the IPv6. If someone out here can help, it would be really appreciated!
|
||||
|
||||
**Workaround**: Remove DHCPv6 for my WAN interface.
|
||||
|
||||
### Confirmation
|
||||
|
||||
Now that everything is fixed, I can evaluate the failover performance.
|
||||
|
||||
1. **Switchover**
|
||||
|
||||
When manually entering CARP maintenance mode from the WebGUI interface, no packet drop is observed. Impressive.
|
||||
|
||||
2. **Failover**
|
||||
|
||||
To simulate a failover, I kill the active OPNsense VM. Here I observe only one packet dropped. Awesome.
|
||||
|
||||

|
||||
|
||||
3. **Disaster Recovery**
|
||||
|
||||
A disaster recovery is what would happen after a full Proxmox cluster stop, after an electrical outage for example. I didn't have the time (or the courage) to do that, I'd prefer to prepare a bit better to avoid collateral damages. But surely, this kind of scenario must be evaluated.
|
||||
|
||||
### Extras Benefits
|
||||
|
||||
Leaving aside the fact that this new setup is more resilient, I have few more bonuses.
|
||||
|
||||
My rack is tiny and the space is tight. The whole thing is heating quite much, exceeding 40°C on top of the rack in summer. Reducing the number of machines powered up lower the temperature. I've gained **1,5°C** after shutting down the old OPNsense box, cool!
|
||||
|
||||
Power consumption is also a concern, my tiny datacenter was drawing 85W on average. Here again I could observe a small decrease, about 8W lower. Considering that this run 24/7, not negligible.
|
||||
|
||||
Finally I also removed the box itself and the power cable. Slots are very limited, another good point.
|
||||
|
||||
---
|
||||
## Conclusion
|
||||
|
||||
🎉 I did it guys! I'm very proud of the results, proud of myself.
|
||||
|
||||
From my [first OPNsense box crash]({{< ref "post/10-opnsense-crash-disk-panic" >}}), the thinking about a solution, the HA [proof of concept]({{< ref "post/12-opnsense-virtualization-highly-available" >}}), to this migration. This has been a quite long project, but extremly interesting.
|
||||
|
||||
🎯 This is great to set objectives, but this is even better when you reach them.
|
||||
|
||||
Now I'm going to leave OPNsense aside for a bit, to be able to re-focus on my Kubernetes journey!
|
||||
|
||||
As always, if you have questions, remarks or a solution for my IPv6 problem, I'll be really happy to share with you.
|
||||
@@ -15,7 +15,7 @@ categories:
|
||||
|
||||
L’un des aspects les plus satisfaisant de la création de mon homelab, c’est de pouvoir y appliquer des outils production-grade. J’ai voulu définir toute mon infrastructure as code, et la première étape que j’ai abordée est le déploiement de Machines Virtuelles avec **Terraform** sur **Proxmox**.
|
||||
|
||||
Dans cet article, je vous guide pas à pas pour créer une simple VM sur Proxmox en utilisant Terraform, basée sur un template **cloud-init** que j’ai détaillé dans [cet article]({{< ref "post/1-proxmox-cloud-init-vm-template" >}}). L’exécution se fait depuis un conteneur LXC dédié qui centralise toute la gestion de mon infrastructure.
|
||||
Dans cet article, je vous guide pas à pas pour créer une simple VM sur Proxmox VE 8 en utilisant Terraform, basée sur un template **cloud-init** que j’ai détaillé dans [cet article]({{< ref "post/1-proxmox-cloud-init-vm-template" >}}). L’exécution se fait depuis un conteneur LXC dédié qui centralise toute la gestion de mon infrastructure.
|
||||
|
||||
📝 Le code complet utilisé dans cet article est disponible dans mon [dépôt GitHub Homelab](https://github.com/Vezpi/Homelab)
|
||||
|
||||
@@ -102,6 +102,43 @@ pveum role add TerraformUser -privs "\
|
||||
SDN.Use"
|
||||
```
|
||||
|
||||
⚠️ La liste des privilèges disponibles a été modifiée dans PVE 9.0, utilisez cette commande:
|
||||
```bash
|
||||
pveum role add TerraformUser -privs "\
|
||||
Datastore.Allocate \
|
||||
Datastore.AllocateSpace \
|
||||
Datastore.Audit \
|
||||
Pool.Allocate \
|
||||
Pool.Audit \
|
||||
Sys.Audit \
|
||||
Sys.Console \
|
||||
Sys.Modify \
|
||||
Sys.Syslog \
|
||||
VM.Allocate \
|
||||
VM.Audit \
|
||||
VM.Clone \
|
||||
VM.Config.CDROM \
|
||||
VM.Config.Cloudinit \
|
||||
VM.Config.CPU \
|
||||
VM.Config.Disk \
|
||||
VM.Config.HWType \
|
||||
VM.Config.Memory \
|
||||
VM.Config.Network \
|
||||
VM.Config.Options \
|
||||
VM.Console \
|
||||
VM.Migrate \
|
||||
VM.GuestAgent.Audit \
|
||||
VM.GuestAgent.FileRead \
|
||||
VM.GuestAgent.FileWrite \
|
||||
VM.GuestAgent.FileSystemMgmt \
|
||||
VM.GuestAgent.Unrestricted \
|
||||
VM.PowerMgmt \
|
||||
Mapping.Audit \
|
||||
Mapping.Use \
|
||||
SDN.Audit \
|
||||
SDN.Use"
|
||||
```
|
||||
|
||||
2. **Créer l'Utilisateur `terraformer`**
|
||||
```bash
|
||||
pveum user add terraformer@pve --password <password>
|
||||
|
||||
@@ -15,7 +15,7 @@ categories:
|
||||
|
||||
One of the most satisfying parts of building a homelab is getting to apply production-grade tooling to a personal setup. I’ve been working on defining my entire infrastructure as code, and the first piece I tackled was VM deployment with **Terraform** on **Proxmox**.
|
||||
|
||||
In this article, I’ll walk you through creating a simple VM on Proxmox using Terraform, based on a **cloud-init** template I covered in [this article]({{< ref "post/1-proxmox-cloud-init-vm-template" >}}). Everything runs from a dedicated LXC container where I manage my whole infrastructure.
|
||||
In this article, I’ll walk you through creating a simple VM on Proxmox VE 8 using Terraform, based on a **cloud-init** template I covered in [this article]({{< ref "post/1-proxmox-cloud-init-vm-template" >}}). Everything runs from a dedicated LXC container where I manage my whole infrastructure.
|
||||
|
||||
📝 The full code used in this article is available in my [Homelab GitHub repository](https://github.com/Vezpi/Homelab)
|
||||
|
||||
@@ -102,6 +102,43 @@ pveum role add TerraformUser -privs "\
|
||||
SDN.Use"
|
||||
```
|
||||
|
||||
⚠️ The list of available privileges has been changed in PVE 9.0, use this command:
|
||||
```bash
|
||||
pveum role add TerraformUser -privs "\
|
||||
Datastore.Allocate \
|
||||
Datastore.AllocateSpace \
|
||||
Datastore.Audit \
|
||||
Pool.Allocate \
|
||||
Pool.Audit \
|
||||
Sys.Audit \
|
||||
Sys.Console \
|
||||
Sys.Modify \
|
||||
Sys.Syslog \
|
||||
VM.Allocate \
|
||||
VM.Audit \
|
||||
VM.Clone \
|
||||
VM.Config.CDROM \
|
||||
VM.Config.Cloudinit \
|
||||
VM.Config.CPU \
|
||||
VM.Config.Disk \
|
||||
VM.Config.HWType \
|
||||
VM.Config.Memory \
|
||||
VM.Config.Network \
|
||||
VM.Config.Options \
|
||||
VM.Console \
|
||||
VM.Migrate \
|
||||
VM.GuestAgent.Audit \
|
||||
VM.GuestAgent.FileRead \
|
||||
VM.GuestAgent.FileWrite \
|
||||
VM.GuestAgent.FileSystemMgmt \
|
||||
VM.GuestAgent.Unrestricted \
|
||||
VM.PowerMgmt \
|
||||
Mapping.Audit \
|
||||
Mapping.Use \
|
||||
SDN.Audit \
|
||||
SDN.Use"
|
||||
```
|
||||
|
||||
2. **Create the User `terraformer`**
|
||||
```bash
|
||||
pveum user add terraformer@pve --password <password>
|
||||
|
||||
@@ -13,6 +13,18 @@ I'm ==testing==
|
||||
|
||||
## Emoji
|
||||
|
||||
🚀💡🔧🔁⚙️📝📌✅⚠️🍒❌ℹ️⌛🚨🎉📖🔥
|
||||
🚀💡🔧🔁⚙️📝📌✅⚠️🍒❌ℹ️⌛🚨🎉📖🔥😈😎🎯
|
||||
|
||||
→
|
||||
|
||||
[post]({{< ref "post/0-template" >}})
|
||||
|
||||
List:
|
||||
- One
|
||||
- Two
|
||||
- Three
|
||||
|
||||
Checklist:
|
||||
- [ ] Not Checked
|
||||
- [x] Checked
|
||||
|
||||
[post]({{< ref "post/0-template" >}})
|
||||
BIN
static/img/opnsense-ping-failover.png
Normal file
|
After Width: | Height: | Size: 47 KiB |
BIN
static/img/proxmox-add-vm-ha.png
Normal file
|
After Width: | Height: | Size: 118 KiB |
BIN
static/img/proxmox-ceph-status-osd-restart.png
Normal file
|
After Width: | Height: | Size: 42 KiB |
BIN
static/img/proxmox-ceph-version-upgrade.png
Normal file
|
After Width: | Height: | Size: 73 KiB |
|
Before Width: | Height: | Size: 221 KiB After Width: | Height: | Size: 197 KiB |
BIN
static/img/proxmox-ha-resource-affinity-rule.png
Normal file
|
After Width: | Height: | Size: 88 KiB |