14 KiB
slug, title, description, date, draft, tags, categories
slug | title | description | date | draft | tags | categories | |||
---|---|---|---|---|---|---|---|---|---|
opnsense-virtualizaton-highly-available | Template | true |
|
|
Intro
I recently encountered my first real problem, my physical OPNsense box crashed because of a kernel panic, I've detailed what happened in that [post]({{< ref "post/10-opnsense-crash-disk-panic" >}}).
After this event, I came up with an idea to enhance the stability of the lab: Virtualize OPNsense.
The idea is pretty simple on paper, create an OPNsense VM on the Proxmox cluster and replace the current physical box by this VM. The challenge would be to have both the LAN and the WAN on the same physical link, involving true network segregation.
Having only one OPNsense VM would not solve my problem. I want to implement High Availability, the bare minimum would be to have 2 OPNsense instances, as active/passive.
Current Setup
Currently, I have my ISB box, a Freebox, in bridge mode which is connected to the port igc0
of my OPNsense box, the WAN. On igc1
, my LAN is connected to my main switch on a trunk port with the VLAN1 as native, my management network.
Connected to that switch are my 3 Proxmox nodes, on trunk port as well with the same native VLAN. Each of my Proxmox nodes have 2 NICs, but the other is dedicated for the Ceph storage network, on a dedicated 2.5GB switch.
The layout changed a little since the OPNsense crash, I dropped the LACP link which was not giving any value:
Target Layout
As I said in the intro, the plan is simple, replace the OPNsense box by a couple of VM in Proxmox. Basically, I will plug my ISB box directly to the main switch, but the native VLAN will have to change, I will create a VLAN dedicated for my WAN communication.
The real challenge will be located on the Proxmox networking, with only one NIC to support communication of LAN, WAN and even cluster, all of that on a 1Gbps port, I'm not sure of the outcome.
Proxmox Networking
My Proxmox networking was quite dumb until really recently. Initially I only configured the network on each nodes. In that [article]({{< ref "post/11-proxmox-cluster-networking-sdn" >}}), I configured my VLANs in the Proxmox SDN.
Additionally, I have to add extra VLANs for this project, one for the WAN and the other for pfSync.
Proof of Concept
Before rushing into a migration, I want to experiment the high availability setup for OPNsense. The idea would be to:
- Add some VLANs in my Homelab
- Create Fake ISP box
- Build two OPNsense VMs
- Configure high availability
- Create another client VM
- Shutdown the active OPNsense node
- See what happen!
Add VLANs in my Homelab
For this experiment, I add extra VLANs:
- 101: POC WAN
- 102: POC LAN
- 103: POC pfSync
In the Proxmox UI, I navigate to Datacenter
> SDN
> VNets
and I click Create
:
Once the 3 new VLAN have been created, I apply the configuration.
Additionally, I add these 3 VLANs in my UniFi controller, here only a name and the VLAN id are sufficient to broadcast the VLANs on the network. All declared VLANs are passing through the trunks where my Proxmox VE nodes are connected.
Create Fake ISP Box VM
For this experience, I will simulate my current ISP box by a VM, fake-freebox
, which will route the traffic between the POC WAN and the POC LAN networks. This VM will serve a DHCP server with only one lease, as my ISP box is doing. I clone my cloud-init template:
I add another NIC, then I edit the Netplan configuration to have:
eth0
(POC WAN VLAN 101): static IP address10.101.0.254/24
- enp6s19 (Lab VLAN 66): DHCP address given by my current OPNsense router
network:
version: 2
ethernets:
eth0:
addresses:
- 10.101.0.254/24
enp6s19:
dhcp4: true
I enable packet forward to allow this VM to route traffic:
echo "net.ipv4.ip_forward=1" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
I set up masquerading for this interface to avoid packet being dropped on my real network by the OPNsense router:
sudo iptables -t nat -A POSTROUTING -o enp6s19 -j MASQUERADE
sudo apt install iptables-persistent -y
sudo netfilter-persistent save
I install dnsmasq
, a small dhcp server:
sudo apt install dnsmasq -y
I edit the file /etc/dnsmasq.conf
to configure dnsmasq
to serve only one lease 10.101.0.150
with DNS pointing to the OPNsense IP:
interface=eth0
bind-interfaces
dhcp-range=10.101.0.150,10.101.0.150,255.255.255.0,12h
dhcp-option=3,10.101.0.254 # default gateway = this VM
dhcp-option=6,192.168.66.1 # DNS server
I restart the dnsmasq service to apply the configuration:
sudo systemctl restart dnsmasq
The fake-freebox
VM is now ready to serve DHCP on VLAN 101 and serve only one lease.
Build OPNsense VMs
First I download the OPNsense ISO from their website and I upload it to one of my Proxmox VE node storage:
I create the first VM from that node which I name poc-opnsense-1
:
- I keep the OS type as Linux, even though OPNsense is based on FreeBSD
- I select
q35
machine type andOVMH (UEFI)
BIOS setting, EFI storage on my Ceph pool - For the disk, I set the disk size to 20GiB
- 2 vCPU with 2048 MB of RAM
- I select the VLAN 101 (POC WAN) for the NIC
- Once the VM creation wizard is finished, I add a second NIC in the VLAN 102 (POC LAN) and a third in the VLAN 103 (POC pfSync)
Before starting it, I clone this one to prepare the next one: poc-opnsense-2
Now I can start the VM, but the boot fails with an access denied. I enter the BIOS, navigate to Device Manager > Secure Boot Configuration, there I uncheck the Attempt Secure Boot
option and restart the VM:
Now the VM boots on the ISO, I touch nothing until I get into that screen:
I enter the installation mode using the user installer
and password opnsense
. I select the French keyboard and select the Install (UFS)
mode. I have a warning about RAM space but I proceed anyway.
I select the QEMU hard disk of 20GB as destination and launch the installation:
Once the installation is finished, I skip the root password change, I remove the ISO from the drive and select the reboot option at the end of the installation wizard.
When the VM has reboot, I log as root
with the default password opnsense
and land in the CLI menu:
I select the option 1 to assign interfaces, as the installer inverted them for my setup:
Now my WAN interface is getting the IP address 10.101.0.150/24 from my fake-freebox
VM. Then I configure the LAN interface with 10.102.0.2/24
and configure a DHCP pool from 10.102.0.10
to 10.102.0.99
:
✅ The first VM is ready, I start over for the second OPNsense VM, poc-opnsense-2
which will have the IP 10.102.0.3
Configure OPNsense Highly Available
Now both of the OPNsense VMs are operational, I want to configure the instances from their WebGUI. To be able to do that, I need to have access from the POC LAN VLAN to the OPNsense interfaces in that network. Simple way to do that, connect a WIndows VM in that VLAN and browse to the OPNsense IP address on port 443:
Add pfSync Interface
The first thing I do is to assign the third NIC, the vtnet2
to the pfSync interface:
I enable the interface on each instance and configure it with a static IP address:
- poc-opnsense-1:
10.103.0.2/24
- poc-opnsense-2:
10.103.0.3/24
On both instances, I create a firewall rule to allow communication coming from this network on that pfSync interface:
Setup High Availability
Then I configure the HA in System
> High Availability
> Settings
. On the master (poc-opnsense-1
) I configure both the General Settings
and the Synchronization Settings
. On the backup (poc-opnsense-2
) I only configure the General Settings
:
Once applied, I can verify that it is ok on the Status
page:
Create Virtual IP Address
Now I need to create the VIP for the LAN interface, an IP address shared across the cluster. The master node will claim that IP which is the gateway given to the clients. The VIP will use the CARP, Common Address Redundancy Protocol for failover. To create it, navigate to Interfaces
> Virtual IPs
> Settings
:
To replicate the config to the backup node, go to System
> High Availability
> Status
and clikc the Synchronize and reconfigure all
button. To verify, on both node navigate to Interfaces
> Virtual IPs
> Status
. The master node should have the VIP active with the status MASTER
, and the backup node with the status BACKUP
.
Reconfigure DHCP
I need to reconfigure the DHCP for HA. Dnsmasq does not support DHCP lease synchronization, I have to configure the two instances independently, they would serve both DHCP lease at the same time.
On the master node, in Services
> Dnsmasq DNS & DHCP
> General
, I tick the Disable HA sync
box. Then in DHCP ranges
, I edit the current one and also tick the Disable HA sync
box. In DHCP options
, I add the option router [3]
with the value 10.102.0.1, to advertise the VIP address:
I clone that rule for the option dns-server [6]
with the same address.
On the backup node, in Services
> Dnsmasq DNS & DHCP
> General
, I also tick the Disable HA sync
box, but I also set the value 5
to DHCP reply delay
. This would give enough time to the master node to provide a DHCP lease before the backup node. In DHCP ranges
, I edit the current one and give a smaller pool, different than the master's. Here I also tick the Disable HA sync
box.
Now I can safely sync my services like described above, this will only propagate the DHCP options, which are mean to be the same.
WAN Interface
The last thing I need to configure is the WAN interface, my ISP box is only giving me one IP address over DHCP, I don't want my 2 VMs compete to claim it. To handle that, I will give my 2 VMs the same MAC for the WAN interface, then I need to find a solution to enable the WAN interface only on the master node.
In the Proxmox WebGUI, I copy the MAC address of the net0 interface (POC WAN) from poc-opnsense-1
and paste it to the one in poc-opnsense-2
.
To handle the activation of the WAN interface on the master node while deactivating the backup, I could use a script. On CARP event, scripts located in /usr/local/etc/rc.syshood.d/carp
are played. I found this Gist which is exactly what I wanted.
I copy this script in /usr/local/etc/rc.syshood.d/carp/10-wan
on both nodes:
#!/usr/local/bin/php
<?php
require_once("config.inc");
require_once("interfaces.inc");
require_once("util.inc");
require_once("system.inc");
$subsystem = !empty($argv[1]) ? $argv[1] : '';
$type = !empty($argv[2]) ? $argv[2] : '';
if ($type != 'MASTER' && $type != 'BACKUP') {
log_error("Carp '$type' event unknown from source '{$subsystem}'");
exit(1);
}
if (!strstr($subsystem, '@')) {
log_error("Carp '$type' event triggered from wrong source '{$subsystem}'");
exit(1);
}
$ifkey = 'wan';
if ($type === "MASTER") {
log_error("enable interface '$ifkey' due CARP event '$type'");
$config['interfaces'][$ifkey]['enable'] = '1';
write_config("enable interface '$ifkey' due CARP event '$type'", false);
interface_configure(false, $ifkey, false, false);
} else {
log_error("disable interface '$ifkey' due CARP event '$type'");
unset($config['interfaces'][$ifkey]['enable']);
write_config("disable interface '$ifkey' due CARP event '$type'", false);
interface_configure(false, $ifkey, false, false);
}
Test Failover
Time for testing! OPNsense provides a way to enter CARP maintenance mode. Before pushing the button, my master has its WAN interface enabled and the backup doesn't:
Once I enter the CARP maintenance mode, the master node become backup and vice versa, the WAN interface get disabled while it's enabling on the other node. I was pinging outside of the network while switching and experienced not a single drop!
Finally, I simulate a crash by powering off the master and the magic happens! Here I have only one packet lost and, thanks to the firewall state sync, I can even keep my SSH connection alive.
Conclusion
backup not having gateway: adding gateway on LAN with IP of the master node