Auto-update blog content from Obsidian: 2026-05-24 20:12:49
All checks were successful
Blog Deployment / Merge (push) Successful in 6s
Blog Deployment / Deploy-Production (push) Successful in 1m3s
Blog Deployment / Test-Production (push) Successful in 3s
Blog Deployment / Clean (push) Has been skipped
Blog Deployment / Notify (push) Successful in 2s
Blog Deployment / Check-Rebuild (push) Successful in 6s
Blog Deployment / Build (push) Has been skipped
Blog Deployment / Deploy-Staging (push) Successful in 1m3s
Blog Deployment / Test-Staging (push) Successful in 2s

This commit is contained in:
Gitea Actions
2026-05-24 20:12:49 +00:00
parent 8e4e4601d7
commit 40ec16e974
4 changed files with 321 additions and 360 deletions

View File

@@ -8,16 +8,17 @@ tags:
- opnsense
- truenas
- proxmox
- high-availability
categories:
- homelab
---
## Intro
My homelab network is handled by an OPNsense cluster composed of two VM nodes. Both of these VMs are running inside my Proxmox VE cluster.
My homelab network is handled by an OPNsense cluster composed of two VM nodes. Both of these VMs are running inside my Proxmox VE cluster. You can find details in this [article]({{< ref "post/15-migration-opnsense-proxmox-highly-available" >}}).
This setup works fine most of the time. The issue is more about the rare cases where the Proxmox cluster itself is down. When that happens, both OPNsense nodes are unavailable at the same time, which means I do not have any router left, so no network at all.
Recently, I installed a TrueNAS server in the lab. You can find the infos in that [post]({{< ref "post/18-create-nas-server-with-truenas" >}}). It is mainly here to act as a NAS, but it could also host virtual machines. That give me a good opportunity to improve the resilience of my network without changing the whole design.
Recently, I installed a TrueNAS server in the labwhich I document in that [post]({{< ref "post/18-create-nas-server-with-truenas" >}}). It is mainly here to act as a NAS, but it could also host virtual machines. That give me a good opportunity to improve the resilience of my network without changing the whole design.
💡 The idea is simple: keep the active OPNsense node on Proxmox, but move the passive node to TrueNAS.
@@ -60,7 +61,7 @@ The first one is the User VLAN:
- Parent interface: `enp1s0`
- VLAN tag: `13`
![Creating the User VLAN interface in TrueNAS](images/truenas-create-new-vlan-interface.png)
![Create the User VLAN interface in TrueNAS](images/truenas-create-new-vlan-interface.png)
I then add the other VLANs in the same way.
@@ -68,13 +69,13 @@ TrueNAS does not apply network changes directly. It gives the option to test the
This is really convenient when changing the network configuration of the machine you are currently connected to.
![Confirming the VLAN interfaces before applying the network changes](images/truenas-network-confirm-add-vlans.png)
![Confirm the VLAN interfaces before applying the network changes](images/truenas-network-confirm-add-vlans.png)
For the management network, I created a bridge called `br1`.
This bridge holds the TrueNAS management IP configuration instead of the physical interface `enp1s0`, because it also needs to be shared with the OPNsense VM.
![Creating the management bridge for TrueNAS and the OPNsense VM](images/truenas-network-mgmt-bridge.png)
![Create the management bridge for TrueNAS and the OPNsense VM](images/truenas-network-mgmt-bridge.png)
After that, I remove the IP configuration from the physical interface and keep it on the bridge.
@@ -86,7 +87,7 @@ For the OPNsense VM, I create a bridge for each VLAN. For example, `br13` uses `
The final TrueNAS network configuration:
![Creating one bridge per VLAN for the OPNsense VM](images/truenas-network-bridges-for-vlan.png)
![Create one bridge per VLAN for the OPNsense VM](images/truenas-network-bridges-for-vlan.png)
---
## Create a Temporary Export Dataset
@@ -107,6 +108,7 @@ These are the Proxmox VE nodes allowed to mount the share.
I don't manually create a zvol at that point. The VM creation process in TrueNAS handle the disk import and conversion.
---
## Export the VM Disk from Proxmox
From the Proxmox VE web interface, I locate the node hosting the passive OPNsense VM `cerbere-head2`, it is running on `Zenith`.
@@ -131,13 +133,14 @@ The conversion took about one minute for a 20 GB disk.
At this point, the passive OPNsense disk is available on TrueNAS and ready to be imported into a new VM.
---
## Recreate the OPNsense VM in TrueNAS
The next step is to recreate the passive OPNsense VM in TrueNAS with parameters matching the original VM as closely as possible.
From the TrueNAS web interface, I go to the `Virtual Machines` section.
![Opening the Virtual Machines section in TrueNAS](images/truenas-vm-menu.png)
![The Virtual Machines section in TrueNAS](images/truenas-vm-menu.png)
I create a new VM with these settings.
@@ -189,23 +192,24 @@ After confirmation, TrueNAS convert the imported qcow2 image into a zvol.
Once the VM is created, I open the VM details and add the remaining NICs.
![Accessing the VM devices in TrueNAS](images/truenas-vm-details.png)
![The VM devices in TrueNAS](images/truenas-vm-details.png)
For each additional NIC, I used VirtIO as the adapter type and attach it to the corresponding bridge.
For the WAN NIC, I copy the old MAC address because I use a single WAN IP address trick. I also increment the digit in the MAC address for the following NICs to keep the order clear.
For the WAN NIC, I copy the old MAC address because I use a single WAN IP address trick. I also increment the digit in the Device Order to keep the same as in Proxmox.
![Adding an additional VirtIO network interface to the OPNsense VM](images/truenas-vm-add-nic.png)
![Additional VirtIO network interface to the OPNsense VM](images/truenas-vm-add-nic.png)
After moving the VM NICs to the VLAN bridges, the passive OPNsense VM started correctly in TrueNAS.
🎉 Finally I can start the OPNsense VM in TrueNAS.
![OPNsense booting successfully as a TrueNAS VM](images/truenas-vm-opnsense-start-shell.png)
## Validating the HA cluster
---
## Validate the HA cluster
Once the passive node was running on TrueNAS, I needed to validate that the OPNsense HA cluster was still behaving correctly.
Once the passive node is running on TrueNAS, I need to validate that the OPNsense HA cluster is still behaving correctly.
I started with basic checks on the passive node:
I start with basic checks on the passive node:
- Management interface ping from the bastion: `192.168.88.3`
- User interface ping from a laptop: `192.168.13.3`
@@ -214,13 +218,13 @@ I started with basic checks on the passive node:
- DMZ interface ping: `192.168.55.3`
- Lab interface ping from DockerVM: `192.168.66.3`
I also checked that the node was accessible over SSH from Termius using `192.168.13.3`, and that the web interface was reachable at:
I also check that the node was accessible over SSH from my laptop using `192.168.13.3`, and that the web interface was reachable at:
```text
https://192.168.13.3:4443
```
Then I validated the OPNsense HA state:
Then I validate the OPNsense HA state:
- CARP VIP status must be `BACKUP` on all VIPs
- HA status page must show that the active node can log in to the passive node
@@ -228,15 +232,16 @@ Then I validated the OPNsense HA state:
- HA service synchronization must work
- Firmware update checks must be accessible
From the active node, I used the HA status page and forced a full synchronization with `Synchronize and reconfigure all`.
From the active node, I use the HA status page and force a full synchronization with `Synchronize and reconfigure all`.
## Switchover tests
---
## Switchover Tests
Before testing failover, I started an SSH session to DockerVM to confirm that firewall states were preserved across nodes. I also started a ping from a laptop to `192.168.37.120`.
Before testing failover, I start a SSH session to `dockerVM` to confirm that firewall states are preserved across nodes. I also start a ping from a laptop to `192.168.37.120`.
For the switchover test, I gracefully enabled maintenance mode on the master node.
For the switchover test, I gracefully enable maintenance mode on the master node.
The passive node became `MASTER`, and I validated the important services:
The new passive node become `MASTER`, and I validate the important services:
- Extra VLAN routing with ping to `192.168.37.120`
- WAN access with ping to `8.8.8.8`
@@ -249,59 +254,30 @@ The passive node became `MASTER`, and I validated the important services:
- Wireguard access from outside
- mDNS by checking if the printer showed up
The switchover was successful.
The switchover is successful.
I also tested the switchback. It required entering maintenance mode and leaving it again to return to the expected state, but the cluster behavior was validated.
---
## Failover Tests
## Failover tests
After the graceful switchover test, I test a more direct failover scenario by forcing a poweroff of the active node.
After the graceful switchover test, I tested a more direct failover scenario by forcing a poweroff of the active node.
I repeated the same validation checklist.
I repeated the same validation checklist:
✅ The failover is successful.
- Extra VLAN routing
- WAN access
- Firewall states
- DNS resolution
- Caddy reverse proxy
- Caddy layer4 proxy
- Wireguard
- mDNS
Finally, I restart the active OPNsense VM.
For DNS, I tested an external domain with:
```text
host microsoft.com
```
And I also checked the internal host:
```text
host SLZB-06M.mgmt.vezpi.com
```
The failover was successful.
Finally, I restarted the active OPNsense VM.
At that point, the OPNsense HA cluster was operational again, with the passive node now running on TrueNAS instead of Proxmox.
## A note about QEMU Guest Agent
The OPNsense VM already had the QEMU Guest Agent installed.
In this setup, it does not seem useful because TrueNAS does not have it implemented as a hypervisor feature in the way I would need here. I kept it installed anyway, because it is harmless.
🎯 At that point, the OPNsense HA cluster is operational again, with the passive node now running on TrueNAS instead of Proxmox.
---
## Conclusion
This migration was a small but important improvement for my homelab.
This migration is a small but important improvement for my homelab.
Before, both OPNsense nodes depended on the Proxmox VE cluster. If the cluster was down, my whole network routing layer was down with it.
Now, the active node still runs on Proxmox, but the passive node runs on TrueNAS. This gives me a better separation between the virtualization cluster and the network failover layer.
The most important part of the project was the TrueNAS networking model. Creating VLAN interfaces was not enough for the VM use case. The working design was to create one bridge per VLAN and attach the OPNsense VM NICs to those bridges.
Little disclaimer, while TrueNAS offers virtualization features, it is not comparable to Proxmox VE in terms of clustering and infrastructure management capabilities.
After validating CARP, HA sync, routing, DNS, Caddy, Wireguard, mDNS and firewall states, the cluster is working as expected.
The passive OPNsense node is now outside of Proxmox, and that is exactly what I wanted: keeping network abilities even when the Proxmox VE cluster is unavailable.
A note about QEMU Guest Agent, the OPNsense VM already had the QEMU Guest Agent installed before expert. In this setup, it does not seem useful because TrueNAS does not have it implemented as a hypervisor feature. I kept it installed anyway, because it is harmless.