diff --git a/content/post/19-migrate-passive-opnsense-node-to-truenas/images/truenas-vm-add-nic.png b/content/post/19-migrate-passive-opnsense-node-to-truenas/images/truenas-vm-add-nic.png index e00c3c0..055b6d2 100644 Binary files a/content/post/19-migrate-passive-opnsense-node-to-truenas/images/truenas-vm-add-nic.png and b/content/post/19-migrate-passive-opnsense-node-to-truenas/images/truenas-vm-add-nic.png differ diff --git a/content/post/19-migrate-passive-opnsense-node-to-truenas/index.fr.md b/content/post/19-migrate-passive-opnsense-node-to-truenas/index.fr.md new file mode 100644 index 0000000..7efbc77 --- /dev/null +++ b/content/post/19-migrate-passive-opnsense-node-to-truenas/index.fr.md @@ -0,0 +1,283 @@ +--- +slug: migrate-passive-opnsense-node-to-truenas +title: Migrate my Passive OPNsense HA Node to TrueNAS +description: I migrated my passive OPNsense HA VM from Proxmox to TrueNAS to keep routing and firewalling available even when my Proxmox cluster is down. +date: 2026-05-24 +draft: true +tags: + - opnsense + - truenas + - proxmox + - high-availability +categories: + - homelab +--- +## Intro + +Mon réseau homelab est géré par un cluster OPNsense composé de deux nœuds VM. Ces deux VM fonctionnent dans mon cluster Proxmox VE. Vous pouvez trouver les détails dans cet [article]({{< ref "post/15-migration-opnsense-proxmox-highly-available" >}}). + +Cette configuration fonctionne bien la plupart du temps. Le problème concerne plutôt les rares cas où le cluster Proxmox lui-même est arrêté. Quand cela arrive, les deux nœuds OPNsense sont indisponibles en même temps, ce qui signifie qu’il ne me reste aucun routeur, donc aucun réseau du tout. + +Récemment, j’ai installé un serveur TrueNAS dans le lab, que j'ai documenté dans ce [post]({{< ref "post/18-create-nas-server-with-truenas" >}}). Il est principalement là pour agir comme NAS, mais il pourrait aussi héberger des machines virtuelles. Cela me donne une bonne opportunité d’améliorer la résilience de mon réseau sans changer toute la conception. + +💡 L’idée est simple : garder le nœud OPNsense actif sur Proxmox, mais déplacer le nœud passif vers TrueNAS. + +De cette façon, si le cluster Proxmox tombe, le nœud OPNsense passif peut toujours prendre le relais et garder le réseau fonctionnel. + +--- +## Préparer les nœuds OPNsense + +Avant de déplacer quoi que ce soit, je veux m’assurer que les VM OPNsense peuvent fonctionner avec moins de mémoire. + +Le serveur TrueNAS n’a pas autant de RAM disponible que le cluster Proxmox, donc la première étape est de réduire l’allocation mémoire des nœuds OPNsense au minimum. + +Je commence avec le nœud passif, `cerbere-head2` : + +- Éteindre le nœud passif +- Réduire son allocation mémoire de 4 à 2GB +- Le redémarrer +- Vérifier la santé du cluster +- Basculer le service vers le nœud passif +- Exécuter des vérifications réseau + +Ensuite, je répète la même opération sur le nœud actif, `cerbere-head1`. + +Le faire un nœud à la fois me permet de garder le cluster HA en bonne santé tout en validant que l’allocation mémoire réduite est toujours suffisante pour ma configuration. + +--- +## Préparer le réseau TrueNAS + +La partie la plus importante de cette migration n’est pas l’export du disque ni la création de la VM. C’est le réseau. + +Une VM OPNsense n’est pas un simple serveur avec une seule interface de management. Elle a besoin d’accéder à plusieurs réseaux, incluant le management, le WAN, les réseaux utilisateurs, l’IoT, pfSync, la DMZ et les réseaux lab. + +Du côté TrueNAS, je commence depuis `System` > `Network` et j’ajoute des interfaces VLAN. + +La première est le VLAN utilisateur : + +- Type : `VLAN` +- Nom : `vlan13` +- Description : `User` +- Interface parente : `enp1s0` +- Tag VLAN : `13` + +![Créer l’interface VLAN utilisateur dans TrueNAS](images/truenas-create-new-vlan-interface.png) + +J’ajoute ensuite les autres VLANs de la même manière. + +TrueNAS n’applique pas les changements réseau directement. Il donne l’option de tester les changements d’abord, avec une courte fenêtre de validation. Si la configuration n’est pas confirmée à temps, il revient automatiquement en arrière. + +C’est vraiment pratique lorsqu’on change la configuration réseau de la machine à laquelle on est actuellement connecté. + +![Confirmer les interfaces VLAN avant d’appliquer les changements réseau](images/truenas-network-confirm-add-vlans.png) + +Pour le réseau de management, j’ai créé un bridge appelé `br1`. + +Ce bridge porte la configuration IP de management de TrueNAS à la place de l’interface physique `enp1s0`, parce qu’elle doit aussi être partagée avec la VM OPNsense. + +![Créer le bridge de management pour TrueNAS et la VM OPNsense](images/truenas-network-mgmt-bridge.png) + +Après cela, je retire la configuration IP de l’interface physique et je la garde sur le bridge. + +![Configuration réseau avant d’appliquer les changements du bridge](images/truenas-network-changes-before-apply.png) + +J’ai initialement essayé d’utiliser DHCP pour le bridge de management après avoir mis à jour l’adresse MAC dans Dnsmasq, mais j’ai finalement décidé de garder une adresse IP statique pour TrueNAS. Après certains changements réseau, DHCP a donné une autre adresse du pool, donc l’adressage statique était l’option la plus sûre et la plus simple pour ce serveur. + +Pour la VM OPNsense, je crée un bridge pour chaque VLAN. Par exemple, `br13` utilise `vlan13`, je déplace aussi la description, comme `User`, de l’interface VLAN vers le bridge pour plus de clarté. + +La configuration réseau finale de TrueNAS : + +![Créer un bridge par VLAN pour la VM OPNsense](images/truenas-network-bridges-for-vlan.png) + +--- +## Create a Temporary Export Dataset + +To move the passive OPNsense VM disk from Proxmox to TrueNAS, I first need a place to export the disk image. + +In TrueNAS, I create a dataset named `storage/vm/disk`, then create a NFS share from it. + +In the advanced options of the NFS share, I configured: + +- Maproot user: `root` +- Authorized hosts: + - `192.168.88.21` + - `192.168.88.22` + - `192.168.88.23` + +These are the Proxmox VE nodes allowed to mount the share. + +I don't manually create a zvol at that point. The VM creation process in TrueNAS handle the disk import and conversion. + +--- +## Export the VM Disk from Proxmox + +From the Proxmox VE web interface, I locate the node hosting the passive OPNsense VM `cerbere-head2`, it is running on `Zenith`. + +I log into that Proxmox node over SSH and mount the NFS share from TrueNAS: + +```bash +mount granite.mgmt.vezpi.com:/mnt/storage/vm/disk /mnt +``` + +Then I shut down the VM from the Proxmox VE interface. I don't shut it down from inside OPNsense because the VM has HA enabled. + +Once the VM is stopped, I export the main disk to qcow2. I don't export the EFI disk. + +```bash +qemu-img convert -f raw -O qcow2 -p \ + rbd:ceph-workload/vm-123-disk-1 \ + /mnt/cerbere-head2.qcow2 +``` + +The conversion took about one minute for a 20 GB disk. + +At this point, the passive OPNsense disk is available on TrueNAS and ready to be imported into a new VM. + +--- +## Recreate the OPNsense VM in TrueNAS + +The next step is to recreate the passive OPNsense VM in TrueNAS with parameters matching the original VM as closely as possible. + +From the TrueNAS web interface, I go to the `Virtual Machines` section. + +![The Virtual Machines section in TrueNAS](images/truenas-vm-menu.png) + +I create a new VM with these settings. + +For the operating system: + +- Guest Operating System: `FreeBSD` +- Name: `cerberehead2` +- System Clock: `Local` +- Boot Method: `UEFI` +- Enable Secure Boot: disabled +- Enable Trusted Platform Module: disabled +- Shutdown Timeout: `90` +- Start on Boot: enabled +- Enable Display VNC: disabled + +The VM name does not use dashes because TrueNAS do not allow them there. + +For CPU and memory: + +- Virtual CPUs: `1` +- Cores: `2` +- Threads: `1` +- CPU Mode: `Custom` +- CPU Model: `qemu64` +- Memory Size: `2 GiB` + +For the disk: + +- Create new disk image +- Import Image: enabled +- Image source: `/mnt/storage/vm/files/cerbere-head2.qcow2` +- Disk Type: `VirtIO` +- Storage Location: `storage/vm` +- Size: `20 GiB` + +For the first network interface: + +- Adapter Type: `VirtIO` +- MAC Address: keep the proposed one +- Attach NIC: `br1: Mgmt` + +I skip installation media and GPU configuration, then confirm the summary. + +![Summary before creating the OPNsense VM in TrueNAS](images/truenas-vm-create-new-summary.png) + +After confirmation, TrueNAS convert the imported qcow2 image into a zvol. + +![TrueNAS converting the imported disk image into a zvol](images/truenas-vm-disk-image-conversion.png) + +Once the VM is created, I open the VM details and add the remaining NICs. + +![The VM devices in TrueNAS](images/truenas-vm-details.png) + +For each additional NIC, I used VirtIO as the adapter type and attach it to the corresponding bridge. + +For the WAN NIC, I copy the old MAC address because I use a single WAN IP address trick. I also increment the digit in the Device Order to keep the same as in Proxmox. + +![Additional VirtIO network interface to the OPNsense VM](images/truenas-vm-add-nic.png) + +🎉 Finally I can start the OPNsense VM in TrueNAS. + +![OPNsense booting successfully as a TrueNAS VM](images/truenas-vm-opnsense-start-shell.png) + +--- +## Validate the HA cluster + +Once the passive node is running on TrueNAS, I need to validate that the OPNsense HA cluster is still behaving correctly. + +I start with basic checks on the passive node: + +- Management interface ping from the bastion: `192.168.88.3` +- User interface ping from a laptop: `192.168.13.3` +- IoT interface ping: `192.168.37.3` +- pfSync ping from the other node: `192.168.44.2` +- DMZ interface ping: `192.168.55.3` +- Lab interface ping from DockerVM: `192.168.66.3` + +I also check that the node was accessible over SSH from my laptop using `192.168.13.3`, and that the web interface was reachable at: + +```text +https://192.168.13.3:4443 +``` + +Then I validate the OPNsense HA state: + +- CARP VIP status must be `BACKUP` on all VIPs +- HA status page must show that the active node can log in to the passive node +- Services must be running as expected +- HA service synchronization must work +- Firmware update checks must be accessible + +From the active node, I use the HA status page and force a full synchronization with `Synchronize and reconfigure all`. + +--- +## Switchover Tests + +Before testing failover, I start a SSH session to `dockerVM` to confirm that firewall states are preserved across nodes. I also start a ping from a laptop to `192.168.37.120`. + +For the switchover test, I gracefully enable maintenance mode on the master node. + +The new passive node become `MASTER`, and I validate the important services: + +- Extra VLAN routing with ping to `192.168.37.120` +- WAN access with ping to `8.8.8.8` +- Firewall states by keeping the SSH session alive +- External DNS resolution with `host redhat.com` +- Internal DNS resolution with `host SLZB-06M.mgmt.vezpi.com` +- Access to a random internet page +- Caddy reverse proxy +- Caddy layer4 proxy +- Wireguard access from outside +- mDNS by checking if the printer showed up + +✅ The switchover is successful. + +--- +## Failover Tests + +After the graceful switchover test, I test a more direct failover scenario by forcing a poweroff of the active node. + +I repeated the same validation checklist. + +✅ The failover is successful. + +Finally, I restart the active OPNsense VM. + +🎯 At that point, the OPNsense HA cluster is operational again, with the passive node now running on TrueNAS instead of Proxmox. + +--- +## Conclusion + +This migration is a small but important improvement for my homelab. + +Before, both OPNsense nodes depended on the Proxmox VE cluster. If the cluster was down, my whole network routing layer was down with it. + +Now, the active node still runs on Proxmox, but the passive node runs on TrueNAS. This gives me a better separation between the virtualization cluster and the network failover layer. + +Little disclaimer, while TrueNAS offers virtualization features, it is not comparable to Proxmox VE in terms of clustering and infrastructure management capabilities. + +A note about QEMU Guest Agent, the OPNsense VM already had the QEMU Guest Agent installed before expert. In this setup, it does not seem useful because TrueNAS does not have it implemented as a hypervisor feature. I kept it installed anyway, because it is harmless. \ No newline at end of file diff --git a/content/post/19-migrate-passive-opnsense-node-to-truenas/index.md b/content/post/19-migrate-passive-opnsense-node-to-truenas/index.md index abbe5eb..1925500 100644 --- a/content/post/19-migrate-passive-opnsense-node-to-truenas/index.md +++ b/content/post/19-migrate-passive-opnsense-node-to-truenas/index.md @@ -8,16 +8,17 @@ tags: - opnsense - truenas - proxmox + - high-availability categories: - homelab --- ## Intro -My homelab network is handled by an OPNsense cluster composed of two VM nodes. Both of these VMs are running inside my Proxmox VE cluster. +My homelab network is handled by an OPNsense cluster composed of two VM nodes. Both of these VMs are running inside my Proxmox VE cluster. You can find details in this [article]({{< ref "post/15-migration-opnsense-proxmox-highly-available" >}}). This setup works fine most of the time. The issue is more about the rare cases where the Proxmox cluster itself is down. When that happens, both OPNsense nodes are unavailable at the same time, which means I do not have any router left, so no network at all. -Recently, I installed a TrueNAS server in the lab. You can find the infos in that [post]({{< ref "post/18-create-nas-server-with-truenas" >}}). It is mainly here to act as a NAS, but it could also host virtual machines. That give me a good opportunity to improve the resilience of my network without changing the whole design. +Recently, I installed a TrueNAS server in the labwhich I document in that [post]({{< ref "post/18-create-nas-server-with-truenas" >}}). It is mainly here to act as a NAS, but it could also host virtual machines. That give me a good opportunity to improve the resilience of my network without changing the whole design. 💡 The idea is simple: keep the active OPNsense node on Proxmox, but move the passive node to TrueNAS. @@ -60,7 +61,7 @@ The first one is the User VLAN: - Parent interface: `enp1s0` - VLAN tag: `13` -![Creating the User VLAN interface in TrueNAS](images/truenas-create-new-vlan-interface.png) +![Create the User VLAN interface in TrueNAS](images/truenas-create-new-vlan-interface.png) I then add the other VLANs in the same way. @@ -68,13 +69,13 @@ TrueNAS does not apply network changes directly. It gives the option to test the This is really convenient when changing the network configuration of the machine you are currently connected to. -![Confirming the VLAN interfaces before applying the network changes](images/truenas-network-confirm-add-vlans.png) +![Confirm the VLAN interfaces before applying the network changes](images/truenas-network-confirm-add-vlans.png) For the management network, I created a bridge called `br1`. This bridge holds the TrueNAS management IP configuration instead of the physical interface `enp1s0`, because it also needs to be shared with the OPNsense VM. -![Creating the management bridge for TrueNAS and the OPNsense VM](images/truenas-network-mgmt-bridge.png) +![Create the management bridge for TrueNAS and the OPNsense VM](images/truenas-network-mgmt-bridge.png) After that, I remove the IP configuration from the physical interface and keep it on the bridge. @@ -86,7 +87,7 @@ For the OPNsense VM, I create a bridge for each VLAN. For example, `br13` uses ` The final TrueNAS network configuration: -![Creating one bridge per VLAN for the OPNsense VM](images/truenas-network-bridges-for-vlan.png) +![Create one bridge per VLAN for the OPNsense VM](images/truenas-network-bridges-for-vlan.png) --- ## Create a Temporary Export Dataset @@ -107,6 +108,7 @@ These are the Proxmox VE nodes allowed to mount the share. I don't manually create a zvol at that point. The VM creation process in TrueNAS handle the disk import and conversion. +--- ## Export the VM Disk from Proxmox From the Proxmox VE web interface, I locate the node hosting the passive OPNsense VM `cerbere-head2`, it is running on `Zenith`. @@ -131,13 +133,14 @@ The conversion took about one minute for a 20 GB disk. At this point, the passive OPNsense disk is available on TrueNAS and ready to be imported into a new VM. +--- ## Recreate the OPNsense VM in TrueNAS The next step is to recreate the passive OPNsense VM in TrueNAS with parameters matching the original VM as closely as possible. From the TrueNAS web interface, I go to the `Virtual Machines` section. -![Opening the Virtual Machines section in TrueNAS](images/truenas-vm-menu.png) +![The Virtual Machines section in TrueNAS](images/truenas-vm-menu.png) I create a new VM with these settings. @@ -189,23 +192,24 @@ After confirmation, TrueNAS convert the imported qcow2 image into a zvol. Once the VM is created, I open the VM details and add the remaining NICs. -![Accessing the VM devices in TrueNAS](images/truenas-vm-details.png) +![The VM devices in TrueNAS](images/truenas-vm-details.png) For each additional NIC, I used VirtIO as the adapter type and attach it to the corresponding bridge. -For the WAN NIC, I copy the old MAC address because I use a single WAN IP address trick. I also increment the digit in the MAC address for the following NICs to keep the order clear. +For the WAN NIC, I copy the old MAC address because I use a single WAN IP address trick. I also increment the digit in the Device Order to keep the same as in Proxmox. -![Adding an additional VirtIO network interface to the OPNsense VM](images/truenas-vm-add-nic.png) +![Additional VirtIO network interface to the OPNsense VM](images/truenas-vm-add-nic.png) -After moving the VM NICs to the VLAN bridges, the passive OPNsense VM started correctly in TrueNAS. +🎉 Finally I can start the OPNsense VM in TrueNAS. ![OPNsense booting successfully as a TrueNAS VM](images/truenas-vm-opnsense-start-shell.png) -## Validating the HA cluster +--- +## Validate the HA cluster -Once the passive node was running on TrueNAS, I needed to validate that the OPNsense HA cluster was still behaving correctly. +Once the passive node is running on TrueNAS, I need to validate that the OPNsense HA cluster is still behaving correctly. -I started with basic checks on the passive node: +I start with basic checks on the passive node: - Management interface ping from the bastion: `192.168.88.3` - User interface ping from a laptop: `192.168.13.3` @@ -214,13 +218,13 @@ I started with basic checks on the passive node: - DMZ interface ping: `192.168.55.3` - Lab interface ping from DockerVM: `192.168.66.3` -I also checked that the node was accessible over SSH from Termius using `192.168.13.3`, and that the web interface was reachable at: +I also check that the node was accessible over SSH from my laptop using `192.168.13.3`, and that the web interface was reachable at: ```text https://192.168.13.3:4443 ``` -Then I validated the OPNsense HA state: +Then I validate the OPNsense HA state: - CARP VIP status must be `BACKUP` on all VIPs - HA status page must show that the active node can log in to the passive node @@ -228,15 +232,16 @@ Then I validated the OPNsense HA state: - HA service synchronization must work - Firmware update checks must be accessible -From the active node, I used the HA status page and forced a full synchronization with `Synchronize and reconfigure all`. +From the active node, I use the HA status page and force a full synchronization with `Synchronize and reconfigure all`. -## Switchover tests +--- +## Switchover Tests -Before testing failover, I started an SSH session to DockerVM to confirm that firewall states were preserved across nodes. I also started a ping from a laptop to `192.168.37.120`. +Before testing failover, I start a SSH session to `dockerVM` to confirm that firewall states are preserved across nodes. I also start a ping from a laptop to `192.168.37.120`. -For the switchover test, I gracefully enabled maintenance mode on the master node. +For the switchover test, I gracefully enable maintenance mode on the master node. -The passive node became `MASTER`, and I validated the important services: +The new passive node become `MASTER`, and I validate the important services: - Extra VLAN routing with ping to `192.168.37.120` - WAN access with ping to `8.8.8.8` @@ -249,59 +254,30 @@ The passive node became `MASTER`, and I validated the important services: - Wireguard access from outside - mDNS by checking if the printer showed up -The switchover was successful. +✅ The switchover is successful. -I also tested the switchback. It required entering maintenance mode and leaving it again to return to the expected state, but the cluster behavior was validated. +--- +## Failover Tests -## Failover tests +After the graceful switchover test, I test a more direct failover scenario by forcing a poweroff of the active node. -After the graceful switchover test, I tested a more direct failover scenario by forcing a poweroff of the active node. +I repeated the same validation checklist. -I repeated the same validation checklist: +✅ The failover is successful. -- Extra VLAN routing -- WAN access -- Firewall states -- DNS resolution -- Caddy reverse proxy -- Caddy layer4 proxy -- Wireguard -- mDNS +Finally, I restart the active OPNsense VM. -For DNS, I tested an external domain with: - -```text -host microsoft.com -``` - -And I also checked the internal host: - -```text -host SLZB-06M.mgmt.vezpi.com -``` - -The failover was successful. - -Finally, I restarted the active OPNsense VM. - -At that point, the OPNsense HA cluster was operational again, with the passive node now running on TrueNAS instead of Proxmox. - -## A note about QEMU Guest Agent - -The OPNsense VM already had the QEMU Guest Agent installed. - -In this setup, it does not seem useful because TrueNAS does not have it implemented as a hypervisor feature in the way I would need here. I kept it installed anyway, because it is harmless. +🎯 At that point, the OPNsense HA cluster is operational again, with the passive node now running on TrueNAS instead of Proxmox. +--- ## Conclusion -This migration was a small but important improvement for my homelab. +This migration is a small but important improvement for my homelab. Before, both OPNsense nodes depended on the Proxmox VE cluster. If the cluster was down, my whole network routing layer was down with it. Now, the active node still runs on Proxmox, but the passive node runs on TrueNAS. This gives me a better separation between the virtualization cluster and the network failover layer. -The most important part of the project was the TrueNAS networking model. Creating VLAN interfaces was not enough for the VM use case. The working design was to create one bridge per VLAN and attach the OPNsense VM NICs to those bridges. +Little disclaimer, while TrueNAS offers virtualization features, it is not comparable to Proxmox VE in terms of clustering and infrastructure management capabilities. -After validating CARP, HA sync, routing, DNS, Caddy, Wireguard, mDNS and firewall states, the cluster is working as expected. - -The passive OPNsense node is now outside of Proxmox, and that is exactly what I wanted: keeping network abilities even when the Proxmox VE cluster is unavailable. \ No newline at end of file +A note about QEMU Guest Agent, the OPNsense VM already had the QEMU Guest Agent installed before expert. In this setup, it does not seem useful because TrueNAS does not have it implemented as a hypervisor feature. I kept it installed anyway, because it is harmless. \ No newline at end of file diff --git a/content/post/19-migrate-passive-opnsense-node-to-truenas/old.md b/content/post/19-migrate-passive-opnsense-node-to-truenas/old.md deleted file mode 100644 index d3e8351..0000000 --- a/content/post/19-migrate-passive-opnsense-node-to-truenas/old.md +++ /dev/null @@ -1,298 +0,0 @@ ---- -slug: migrate-passive-opnsense-node-to-truenas -title: -description: -date: 2026-03-12 -draft: true -tags: - - opnsense - - truenas - - proxmox -categories: - - homelab ---- - -## Intro - -My router is the heart of my homelab. When it’s down, everything is down: internet, DNS, VLAN firewall, reverse proxy… the whole stack. - -I’m running an [[OPNsense]] HA cluster made of **two virtual machines** inside my [[Proxmox]] VE cluster. It works great… except for one annoying edge case: when the Proxmox cluster is down (rare, but it happens), I suddenly have **no router left**. - -Recently I installed a [[TrueNAS]] server ([[Build my NAS with TrueNAS]]), and TrueNAS can host virtual machines. So I decided to move **only the passive OPNsense node** to TrueNAS, so that if Proxmox goes dark, I still have a node alive that can take over and keep the network running. - -The objective of this post is simple: explain what I migrated, why I did it, and what configuration choices made it work reliably. - ---- - -## The Plan: Split the HA Pair Across Two Hypervisors - -The goal was: - -- Keep the **active** OPNsense node running on Proxmox VE (where it already lives). -- Migrate the **passive** node to TrueNAS. -- Validate that the HA cluster still behaves properly (CARP VIPs, sync, services, failover). - -This way, a Proxmox outage no longer means “no routing at all”. - ---- - -## What I Used - -Quick overview of the pieces involved: - -- **OPNsense**: https://opnsense.org/ -- **Proxmox VE** (current home of both OPNsense VMs): https://www.proxmox.com/en/proxmox-virtual-environment/overview -- **TrueNAS** (new home of the passive node, and storage to transfer the VM disk): https://www.truenas.com/ - ---- - -## Step 1 — Make OPNsense Lighter (RAM Reduction) - -TrueNAS on my side doesn’t have “infinite RAM”, so the first step was to reduce memory usage to something more reasonable. - -I reduced the memory allocation of both OPNsense nodes in Proxmox: - -- Shutdown passive node `cerbere-head2` -- Reduce RAM, restart, verify HA -- Swap services to the passive temporarily and test networking -- Shutdown active node `cerbere-head1` -- Reduce RAM, restart, verify HA again - -This kept the cluster healthy while ensuring the VM would fit comfortably on the NAS. - -(Details: [[Reduce the memory allocation of OPNsense nodes]]) - ---- - -## Step 2 — Prepare Networking on TrueNAS (Trunk + VLAN Strategy) - -To host an OPNsense VM properly, TrueNAS must be able to present the right networks to the VM (Mgmt, VLANs, etc.). In my case, I needed a trunk configuration. - -In TrueNAS, I went to `System` > `Network` and created VLAN interfaces (example with VLAN 13): - -![[truenas-create-new-vlan-interface.png]] - -TrueNAS is nice here: changes aren’t applied blindly. You can **test** them and you get a rollback window, which is exactly what you want when you’re touching the network config remotely: - -![[truenas-network-confirm-add-vlans.png]] - -### Management bridge - -I created a bridge `br1` for the management interface, shared between: - -- TrueNAS itself -- the future OPNsense VM - -And moved the IP configuration to the bridge: - -![[truenas-network-mgmt-bridge.png]] - -Final view before apply: - -![[truenas-network-changes-before-apply.png]] - -### Static IP vs DHCP (and why I stayed static) - -I initially tried switching the management bridge to DHCP by updating the MAC address in OPNsense (Dnsmasq override): - -![[opnsense-update-dnsmasq-override-truenas-bridge.png]] - -Then I attempted to flip TrueNAS from static to DHCP: - -![[truenas-network-bridge-switch-static-to-dhcp.png]] - -But DHCP didn’t behave as I expected: it kept receiving random IPs from the pool. I suspected existing leases played a role. I even tried manually editing leases and restarting the service, but after another change, it still ended up with a random address again. - -In the end, I gave up and kept **a static IP** for TrueNAS. It’s boring, but it’s predictable. - -### The key decision: bridge VLANs (not just VLAN interfaces) - -This became important later: I originally planned to attach VLAN interfaces directly to the OPNsense VM, but it didn’t behave well. - -So I created **one bridge per VLAN** (ex: `br13` with `vlan13` as the only member), and used those bridges for the VM NICs: - -![[truenas-network-bridges-for-vlan.png]] - -That ended up being the difference between “split-brain chaos” and “stable HA”. - -(Full notes: [[Configure the trunk in TrueNAS]]) - ---- - -## Step 3 — Move the VM Disk From Proxmox to TrueNAS - -To migrate the VM cleanly, I exported the Proxmox disk to TrueNAS. - -### Create a dataset and export it via NFS - -I created a dataset (initially called `disk`) and exported it with NFS, restricting access to my three Proxmox nodes (by IP): - -- 192.168.88.21 -- 192.168.88.22 -- 192.168.88.23 - -(Notes: [[Create a new dataset in TrueNAS to export Proxmox VM disk]]) - -### Export the passive OPNsense disk - -On the Proxmox node hosting the passive VM (`cerbere-head2`), I mounted the NFS share: - -```bash -mount granite.mgmt.vezpi.com:/mnt/storage/disk /mnt -``` - -Then I shut down the VM from Proxmox (HA enabled, so I didn’t do it from inside OPNsense), and converted/exported the main disk (not the EFI disk) from Ceph RBD to a qcow2 file: - -```bash -qemu-img convert -f raw -O qcow2 -p \ - rbd:ceph-workload/vm-123-disk-1 \ - /mnt/cerbere-head2.qcow2 -``` - -The conversion took around a minute for a 20GB disk. - -(Notes: [[Export the passive OPNsense VM disk from Proxmox]]) - -### Dataset reorg (cleaner layout) - -I reorganized datasets on TrueNAS side to something more VM-oriented: - -- created `storage/vm` -- renamed `storage/disk` to `storage/vm/files` - -Commands used: - -```bash -zfs list -sudo zfs create storage/vm -sudo zfs rename storage/disk storage/vm/files -``` - -(Notes: [[Reorganize the dataset in TrueNAS]]) - ---- - -## Step 4 — Create the OPNsense VM on TrueNAS (Import Disk + Rebuild NICs) - -Now the fun part: recreating the VM on TrueNAS with the same “spirit” as the Proxmox VM. - -From `Virtual Machines`: - -![[truenas-vm-menu.png]] - -### VM settings I used - -I created a new VM with: - -**Operating System** -- Guest: FreeBSD -- Name: `cerberehead2` (TrueNAS doesn’t like dashes) -- Boot: UEFI -- Secure Boot: Disabled -- TPM: Disabled -- Start on Boot: Enabled -- VNC: Disabled - -**CPU & Memory** -- Virtual CPUs: 1 -- Cores: 2 -- Threads: 1 -- CPU Mode: Custom -- CPU Model: `qemu64` -- Memory: 2 GiB - -**Disk** -- Import image enabled -- Source: `/mnt/storage/vm/files/cerbere-head2.qcow2` -- Disk Type: VirtIO -- Location: `storage/vm` -- Size: 20 GiB - -**Network** -- Adapter: VirtIO -- Attached to `br1` (Mgmt) -- MAC: kept the generated one here - -Summary screen: - -![[truenas-vm-create-new-summary.png]] - -After saving, TrueNAS converted the imported image into a Zvol: - -![[truenas-vm-disk-image-conversion.png]] - -### Adding the additional NICs - -After the VM was created, I added the additional NICs in the VM device list: - -![[truenas-vm-details.png]] - -At first, I attached VLAN interfaces directly and started the VM… and instantly broke my network (great success). - -The VM itself booted fine though, and seeing OPNsense come up cleanly on TrueNAS was a good sign: - -![[truenas-vm-opnsense-start-shell.png]] - -But HA-wise, it was a mess: split-brain symptoms, with the TrueNAS-hosted node thinking it was MASTER on almost everything except Mgmt. - -The fix was the VLAN bridging approach mentioned earlier: once I switched the VM NICs to attach to **bridges (`br13`, `br20`, etc.) instead of VLAN interfaces**, the cluster came back to a healthy state. - -Second try: stable. ✅ - -(Notes: [[Create the OPNsense VM in TrueNAS]]) - ---- - -## Step 5 — Validate HA: CARP, Sync, Services, Switchover and Failover - -Once everything was in place, I validated the new setup with a proper checklist. I wanted to be sure the cluster worked exactly as before. - -### Basic checks - -- Ping each interface as relevant (Mgmt/User/IoT/pfSync/DMZ/Lab) -- SSH access -- Web UI access -- CARP VIP status must be `BACKUP` on the passive node -- HA status (active must be able to log into passive) -- Services state + “Synchronize and reconfigure all” -- Check updates availability (`System` > `Firmware` > `Check for updates`) - -### Switchover test (graceful) - -I started: -- a SSH session to DockerVM (to check state keeping) -- a ping to an IoT host from a laptop - -Then tested: -- CARP role switch -- inter-VLAN routing -- WAN ping to `8.8.8.8` -- firewall state (SSH session stays alive) -- DNS resolution (external + internal) -- Caddy reverse proxy + layer4 proxy checks -- Wireguard access from outside -- mDNS discovery (printer visibility) - -✅ Switchover successful. - -### Failover test (hard) - -Then I forced power off of the active node and repeated the same functional tests. - -✅ Failover successful. - -At the end: restarted the active VM, and the HA pair returned to normal operation. - -One note: QEMU Guest Agent doesn’t bring value here because TrueNAS doesn’t implement it as a hypervisor (I still left it installed since it’s harmless). - -(Full checklist and validation steps: [[Validate the new OPNsense VM and cluster state]]) - ---- - -## Conclusion - -This project solved a real weakness in my homelab: my “highly available” router cluster was still depending on a single platform (Proxmox). By moving only the **passive OPNsense node** to **TrueNAS**, I now have a router that can survive a full Proxmox outage. - -The biggest takeaway for me was networking on TrueNAS: attaching VLAN interfaces directly to the VM was not reliable in my setup, but bridging each VLAN (`br13`, `br20`, etc.) made the HA behavior stable and predictable. - -Next step is to monitor the cluster for a few days before doing the cleanup of the migration on the Proxmox side. \ No newline at end of file