Auto-update blog content from Obsidian: 2026-05-24 12:30:51
All checks were successful
Blog Deployment / Check-Rebuild (push) Successful in 8s
Blog Deployment / Build (push) Has been skipped
Blog Deployment / Deploy-Staging (push) Successful in 1m5s
Blog Deployment / Test-Staging (push) Successful in 3s
Blog Deployment / Merge (push) Successful in 8s
Blog Deployment / Deploy-Production (push) Successful in 1m4s
Blog Deployment / Test-Production (push) Successful in 3s
Blog Deployment / Clean (push) Has been skipped
Blog Deployment / Notify (push) Successful in 3s

This commit is contained in:
Gitea Actions
2026-05-24 12:30:51 +00:00
parent a01f4dcf4e
commit 0e81ddf7ed
2 changed files with 317 additions and 1 deletions

View File

@@ -0,0 +1,316 @@
## Intro
My homelab network is handled by an OPNsense cluster composed of two VM nodes. Both of these VMs were running inside my Proxmox VE cluster.
This setup works fine most of the time. The issue is more about the rare cases where the Proxmox cluster itself is down. When that happens, both OPNsense nodes are unavailable at the same time, which means I do not have any router left.
Recently, I installed a TrueNAS server in the lab. It is mainly here to act as a NAS, but it can also host virtual machines. That gave me a good opportunity to improve the resilience of my network without changing the whole design.
The idea was simple: keep the active OPNsense node on Proxmox, but move the passive node to TrueNAS.
This way, if the Proxmox cluster goes down, the passive OPNsense node can still take over and keep the network alive.
---
## Preparing the OPNsense nodes
Before moving anything, I wanted to make sure the OPNsense VMs could run with less memory.
The TrueNAS server does not have as much RAM available as the Proxmox cluster, so the first step was to reduce the memory allocation of the OPNsense nodes to the minimum I wanted to use.
I started with the passive node, `cerbere-head2`:
- Shut down the passive node
- Reduced its memory allocation
- Restarted it
- Verified the cluster health
- Swapped the service to the passive node
- Ran network checks
Then I repeated the same operation on the active node, `cerbere-head1`.
Doing it one node at a time allowed me to keep the HA cluster healthy while validating that the reduced memory allocation was still enough for my setup.
When I later recreated the VM in TrueNAS, I used 2 GiB of memory for the passive node.
## Preparing the TrueNAS network
The most important part of this migration was not the disk export or the VM creation. It was the network.
An OPNsense VM is not a simple server with one management interface. It needs access to several networks, including management, WAN, user networks, IoT, pfSync, DMZ and lab networks.
On the TrueNAS side, I started from `System` > `Network` and added VLAN interfaces.
The first one was the User VLAN:
- Type: `VLAN`
- Name: `vlan13`
- Description: `User`
- Parent interface: `enp1s0`
- VLAN tag: `13`
![[truenas-create-new-vlan-interface.png]]Creating the User VLAN interface in TrueNAS
I then added the other VLANs in the same way.
TrueNAS does not apply network changes directly. It gives the option to test the changes first, with a short validation window. If the configuration is not confirmed in time, it rolls back automatically.
This is really convenient when changing the network configuration of the machine you are currently connected to.
![[truenas-network-confirm-add-vlans.png]]Confirming the VLAN interfaces before applying the network changes
For the management network, I created a bridge called `br1`.
This bridge holds the TrueNAS management IP configuration instead of the physical interface `enp1s0`, because it also needs to be shared with the OPNsense VM.
![[truenas-network-mgmt-bridge.png]]Creating the management bridge for TrueNAS and the OPNsense VM
After that, I removed the IP configuration from the physical interface and kept it on the bridge.
![[truenas-network-changes-before-apply.png]]Network configuration before applying the bridge changes
I initially tried to use DHCP for the management bridge after updating the MAC address in Dnsmasq, but I finally decided to keep a static IP address for TrueNAS. After some network changes, DHCP gave another address from the pool, so static addressing was the safer and simpler option for this server.
The final TrueNAS network configuration had the management bridge and the VLAN interfaces.
![[truenas-network-network-trunk-config.png]]TrueNAS network configuration with VLAN interfaces and the management bridge
One important lesson from this migration is that attaching VLAN interfaces directly to the VM was not the final design I kept.
For the OPNsense VM, I created a bridge for each VLAN and attached the VM NICs to these bridges instead. For example, `br13` uses `vlan13` as its only member and has no IP address. I also moved the description, like `User`, from the VLAN interface to the bridge for clarity.
![[truenas-network-bridges-for-vlan.png]]Creating one bridge per VLAN for the OPNsense VM
This bridge-per-VLAN design is the configuration that worked correctly for the OPNsense VM in TrueNAS.
## Creating a temporary export dataset
To move the passive OPNsense VM disk from Proxmox to TrueNAS, I first needed a place to export the disk image.
In TrueNAS, I created a dataset named `disk`, then created an NFS share from it.
In the advanced options of the NFS share, I configured:
- Maproot user: `root`
- Authorized hosts:
- `192.168.88.21`
- `192.168.88.22`
- `192.168.88.23`
These are the Proxmox VE nodes allowed to mount the share.
Later, I reorganized the dataset layout. I created a parent dataset called `storage/vm` and renamed the original export dataset from `storage/disk` to `storage/vm/files`.
From the TrueNAS shell, this was done with ZFS commands:
```zsh
sudo zfs create storage/vm
```
```zsh
sudo zfs rename storage/disk storage/vm/files
```
I did not manually create a zvol at that point. The VM creation process in TrueNAS handled the disk import and conversion.
## Exporting the VM disk from Proxmox
From the Proxmox VE web interface, I located the node hosting the passive OPNsense VM `cerbere-head2`.
It was running on `Zenith`.
I logged into that Proxmox node over SSH and mounted the NFS share from TrueNAS:
```bash
mount granite.mgmt.vezpi.com:/mnt/storage/disk /mnt
```
Then I shut down the VM from the Proxmox VE interface. I did not shut it down from inside OPNsense because the VM had HA enabled.
Once the VM was stopped, I exported the main disk to qcow2. I did not export the EFI disk.
```bash
qemu-img convert -f raw -O qcow2 -p \
rbd:ceph-workload/vm-123-disk-1 \
/mnt/cerbere-head2.qcow2
```
The conversion took about one minute for a 20 GB disk.
At this point, the passive OPNsense disk was available on TrueNAS and ready to be imported into a new VM.
## Recreating the OPNsense VM in TrueNAS
The next step was to recreate the passive OPNsense VM in TrueNAS with parameters matching the original VM as closely as possible.
From the TrueNAS web interface, I went to the `Virtual Machines` section.
![[truenas-vm-menu.png]]Opening the Virtual Machines section in TrueNAS
I created a new VM with these settings.
For the operating system:
- Guest Operating System: `FreeBSD`
- Name: `cerberehead2`
- System Clock: `Local`
- Boot Method: `UEFI`
- Enable Secure Boot: disabled
- Enable Trusted Platform Module: disabled
- Shutdown Timeout: `90`
- Start on Boot: enabled
- Enable Display VNC: disabled
The VM name does not use dashes because TrueNAS did not allow them there.
For CPU and memory:
- Virtual CPUs: `1`
- Cores: `2`
- Threads: `1`
- CPU Mode: `Custom`
- CPU Model: `qemu64`
- Memory Size: `2 GiB`
For the disk:
- Create new disk image
- Import Image: enabled
- Image source: `/mnt/storage/vm/files/cerbere-head2.qcow2`
- Disk Type: `VirtIO`
- Storage Location: `storage/vm`
- Size: `20 GiB`
For the first network interface:
- Adapter Type: `VirtIO`
- MAC Address: keep the proposed one
- Attach NIC: `br1: Mgmt`
I skipped installation media and GPU configuration, then confirmed the summary.
![[truenas-vm-create-new-summary.png]]Summary before creating the OPNsense VM in TrueNAS
After confirmation, TrueNAS converted the imported qcow2 image into a zvol.
![[truenas-vm-disk-image-conversion.png]]TrueNAS converting the imported disk image into a zvol
Once the VM was created, I opened the VM details and added the remaining NICs.
![[truenas-vm-details.png]]Accessing the VM devices in TrueNAS
For each additional NIC, I used VirtIO as the adapter type and attached it to the corresponding bridge.
For the WAN NIC, I copied the old MAC address because I use a single WAN IP address trick. I also incremented the digit in the MAC address for the following NICs to keep the order clear.
![[truenas-vm-add-nic.png]]Adding an additional VirtIO network interface to the OPNsense VM
After moving the VM NICs to the VLAN bridges, the passive OPNsense VM started correctly in TrueNAS.
![[truenas-vm-opnsense-start-shell.png]]OPNsense booting successfully as a TrueNAS VM
## Validating the HA cluster
Once the passive node was running on TrueNAS, I needed to validate that the OPNsense HA cluster was still behaving correctly.
I started with basic checks on the passive node:
- Management interface ping from the bastion: `192.168.88.3`
- User interface ping from a laptop: `192.168.13.3`
- IoT interface ping: `192.168.37.3`
- pfSync ping from the other node: `192.168.44.2`
- DMZ interface ping: `192.168.55.3`
- Lab interface ping from DockerVM: `192.168.66.3`
I also checked that the node was accessible over SSH from Termius using `192.168.13.3`, and that the web interface was reachable at:
```text
https://192.168.13.3:4443
```
Then I validated the OPNsense HA state:
- CARP VIP status must be `BACKUP` on all VIPs
- HA status page must show that the active node can log in to the passive node
- Services must be running as expected
- HA service synchronization must work
- Firmware update checks must be accessible
From the active node, I used the HA status page and forced a full synchronization with `Synchronize and reconfigure all`.
## Switchover tests
Before testing failover, I started an SSH session to DockerVM to confirm that firewall states were preserved across nodes. I also started a ping from a laptop to `192.168.37.120`.
For the switchover test, I gracefully enabled maintenance mode on the master node.
The passive node became `MASTER`, and I validated the important services:
- Extra VLAN routing with ping to `192.168.37.120`
- WAN access with ping to `8.8.8.8`
- Firewall states by keeping the SSH session alive
- External DNS resolution with `host redhat.com`
- Internal DNS resolution with `host SLZB-06M.mgmt.vezpi.com`
- Access to a random internet page
- Caddy reverse proxy
- Caddy layer4 proxy
- Wireguard access from outside
- mDNS by checking if the printer showed up
The switchover was successful.
I also tested the switchback. It required entering maintenance mode and leaving it again to return to the expected state, but the cluster behavior was validated.
## Failover tests
After the graceful switchover test, I tested a more direct failover scenario by forcing a poweroff of the active node.
I repeated the same validation checklist:
- Extra VLAN routing
- WAN access
- Firewall states
- DNS resolution
- Caddy reverse proxy
- Caddy layer4 proxy
- Wireguard
- mDNS
For DNS, I tested an external domain with:
```text
host microsoft.com
```
And I also checked the internal host:
```text
host SLZB-06M.mgmt.vezpi.com
```
The failover was successful.
Finally, I restarted the active OPNsense VM.
At that point, the OPNsense HA cluster was operational again, with the passive node now running on TrueNAS instead of Proxmox.
## A note about QEMU Guest Agent
The OPNsense VM already had the QEMU Guest Agent installed.
In this setup, it does not seem useful because TrueNAS does not have it implemented as a hypervisor feature in the way I would need here. I kept it installed anyway, because it is harmless.
## Conclusion
This migration was a small but important improvement for my homelab.
Before, both OPNsense nodes depended on the Proxmox VE cluster. If the cluster was down, my whole network routing layer was down with it.
Now, the active node still runs on Proxmox, but the passive node runs on TrueNAS. This gives me a better separation between the virtualization cluster and the network failover layer.
The most important part of the project was the TrueNAS networking model. Creating VLAN interfaces was not enough for the VM use case. The working design was to create one bridge per VLAN and attach the OPNsense VM NICs to those bridges.
After validating CARP, HA sync, routing, DNS, Caddy, Wireguard, mDNS and firewall states, the cluster is working as expected.
The passive OPNsense node is now outside of Proxmox, and that is exactly what I wanted: keeping network abilities even when the Proxmox VE cluster is unavailable.