Auto-update blog content from Obsidian: 2025-08-24 07:40:43
All checks were successful
Blog Deployment / Check-Rebuild (push) Successful in 6s
Blog Deployment / Build (push) Has been skipped
Blog Deployment / Deploy-Staging (push) Successful in 10s
Blog Deployment / Test-Staging (push) Successful in 2s
Blog Deployment / Merge (push) Successful in 7s
Blog Deployment / Deploy-Production (push) Successful in 10s
Blog Deployment / Test-Production (push) Successful in 2s
Blog Deployment / Clean (push) Has been skipped
Blog Deployment / Notify (push) Successful in 2s
All checks were successful
Blog Deployment / Check-Rebuild (push) Successful in 6s
Blog Deployment / Build (push) Has been skipped
Blog Deployment / Deploy-Staging (push) Successful in 10s
Blog Deployment / Test-Staging (push) Successful in 2s
Blog Deployment / Merge (push) Successful in 7s
Blog Deployment / Deploy-Production (push) Successful in 10s
Blog Deployment / Test-Production (push) Successful in 2s
Blog Deployment / Clean (push) Has been skipped
Blog Deployment / Notify (push) Successful in 2s
This commit is contained in:
224
content/post/opnsense-crash.fr.md
Normal file
224
content/post/opnsense-crash.fr.md
Normal file
@@ -0,0 +1,224 @@
|
||||
---
|
||||
slug:
|
||||
title: Template
|
||||
description:
|
||||
date: 2025-08-22
|
||||
draft: true
|
||||
tags:
|
||||
- opnsense
|
||||
categories:
|
||||
- homelab
|
||||
---
|
||||
## Intro
|
||||
|
||||
Cette semaine, j’ai vécu mon premier vrai problème dans mon homelab, qui a fait tomber tout mon réseau à la maison.
|
||||
|
||||
Mon routeur OPNsense a crash et, après plusieurs tentatives de récupération ratées, j’ai finalement dû le réinstaller from scratch. Heureusement, presque toute la configuration est revenue grâce à un simple fichier XML. Dans cette histoire, je vais raconter ce qui s’est passé, ce que j’ai fait pour m’en sortir, et aussi ce que je n’aurais pas dû faire.
|
||||
|
||||
Ce genre d’exercice est la pire chose que vous souhaitez voir arriver, parce que ce n’est jamais amusant de voir tout exploser. Mais c’est de loin la meilleure façon d’apprendre.
|
||||
|
||||
## Le Calme Avant la Tempête
|
||||
|
||||
Ma box OPNsense tournait parfaitement depuis des mois. Routeur, pare-feu, DNS, DHCP, VLANs, VPN, reverse proxy et même contrôleur UniFi : toutes les pièces de mon homelab passe par elle. Mais pas seulement, elle fournit aussi Internet à la maison.
|
||||
|
||||
Cette box est le cœur de mon réseau, sans elle, je ne peux quasiment rien faire. J’ai détaillé son fonctionnement dans ma section [Homelab]({{< ref "page/homelab" >}}). Tout “fonctionnait juste”, et je ne m’en inquiétait pas. J’étais confiant, sa sauvegarde vivait uniquement à l’intérieur de la machine…
|
||||
|
||||
Peut-être trop confiant.
|
||||
|
||||
## Le Redémarrage Inattendu
|
||||
|
||||
Sans prévenir, la box a redémarré toute seule, juste avant minuit. Par chance, je passais à côté de mon rack en allant me coucher. J’ai su qu’elle avait redémarré car j’ai entendu son petit bip de démarrage.
|
||||
|
||||
Je me suis demandé pourquoi le routeur avait redémarré sans mon accord. Dans mon lit, j’ai rapidement vérifié si Internet fonctionnait : oui. Mais aucun de mes services n’était disponible, ni la domotique, ni ce blog. J’étais fatigué, je réglerais ça le lendemain…
|
||||
|
||||
Au matin, en regardant les logs, j’ai trouvé le coupable :
|
||||
```
|
||||
panic: double fault
|
||||
```
|
||||
|
||||
Un kernel panic. Mon routeur avait littéralement planté au niveau matériel.
|
||||
|
||||
## Premières Tentatives de Dépannage
|
||||
|
||||
Au début, l’impact semblait mineur. Un seul service ne redémarrait pas : Caddy, mon reverse proxy. Ce qui expliquait pourquoi mes services n’étaient pas accessibles.
|
||||
|
||||
En fouillant dans les logs, j’ai trouvé l’erreur :
|
||||
```
|
||||
caching certificate: decoding certificate metadata: unexpected end of JSON input
|
||||
```
|
||||
|
||||
Un des certificats mis en cache avait été corrompu pendant le crash. En supprimant son dossier de cache, Caddy est reparti et, d’un coup, tous mes services HTTPS étaient de retour.
|
||||
|
||||
Je pensais avoir esquivé la balle. Je n’ai pas cherché plus loin sur la cause réelle : les logs du kernel étaient pollués par une interface qui “flappait”, j’ai cru à un simple bug. À la place, je me suis lancé dans une mise à jour, ma première erreur.
|
||||
|
||||
Mon instance OPNsense était en version 25.1, et la 25.7 venait de sortir. Allons-y gaiement !
|
||||
|
||||
La mise à jour s’est déroulée correctement, mais quelque chose clochait. En cherchant de nouvelles updates, j’ai vu une corruption dans `pkg`, la base de données du gestionnaire de paquets :
|
||||
```
|
||||
pkg: sqlite error while executing iterator in file pkgdb_iterator.c:1110: database disk image is malformed
|
||||
```
|
||||
|
||||
🚨 Mon alarme interne s'est déclenchée. J’ai pensé aux sauvegardes et j’ai immédiatement téléchargé la dernière :
|
||||

|
||||
|
||||
En cliquant sur le bouton `Download configuration`, j’ai récupéré le `config.xml` en cours d’utilisation. Je pensais que ça suffirait.
|
||||
|
||||
## Corruption du Système de Fichiers
|
||||
|
||||
J’ai tenté de réparer la base `pkg` de la pire façon possible : j’ai sauvegardé le dossier `/var/db/pkg` puis essayé de refaire un `bootstrap` :
|
||||
```bash
|
||||
cp -a /var/db/pkg /var/db/pkg.bak
|
||||
pkg bootstrap -f
|
||||
```
|
||||
```
|
||||
The package management tool is not yet installed on your system.
|
||||
Do you want to fetch and install it now? [y/N]: y
|
||||
Bootstrapping pkg from https://pkg.opnsense.org/FreeBSD:14:amd64/25.7/latest, please wait...
|
||||
[...]
|
||||
pkg-static: Fail to extract /usr/local/lib/libpkg.a from package: Write error
|
||||
Failed to install the following 1 package(s): /tmp//pkg.pkg.scQnQs
|
||||
[...]
|
||||
A pre-built version of pkg could not be found for your system.
|
||||
```
|
||||
|
||||
J’ai vu un `Write error`. Je soupçonnais un problème disque. J’ai lancé `fsck` et reçu un flot d’incohérences :
|
||||
```bash
|
||||
fsck -n
|
||||
```
|
||||
```
|
||||
[...]
|
||||
INCORRECT BLOCK COUNT I=13221121 (208384 should be 208192)
|
||||
INCORRECT BLOCK COUNT I=20112491 (8 should be 0)
|
||||
INCORRECT BLOCK COUNT I=20352874 (570432 should be 569856)
|
||||
[...]
|
||||
FREE BLK COUNT(S) WRONG IN SUPERBLK
|
||||
[...]
|
||||
SUMMARY INFORMATION BAD
|
||||
[...]
|
||||
BLK(S) MISSING IN BIT MAPS
|
||||
[...]
|
||||
***** FILE SYSTEM IS LEFT MARKED AS DIRTY *****
|
||||
```
|
||||
|
||||
Le système de fichiers root était en mauvais état.
|
||||
|
||||
N’ayant que SSH et pas de console, j’ai forcé un `fsck` au prochain redémarrage :
|
||||
```bash
|
||||
sysrc fsck_y_enable="YES"
|
||||
sysrc background_fsck="NO"
|
||||
reboot
|
||||
```
|
||||
|
||||
Au redémarrage, le système a été réparé suffisamment pour relancer `pkg bootstrap`. Mais la moitié des paquets système avaient disparu. Ma mise à jour précédente sur un disque corrompu m’avait laissé avec un système bancal, à moitié installé, à moitié manquant.
|
||||
|
||||
## Quand ça empire
|
||||
|
||||
J’ai découvert l’utilitaire `opnsense-bootstrap`, censé remettre le système à plat :
|
||||
- Suppression de tous les paquets installés
|
||||
- Téléchargement et installation d’un nouveau noyau/base 25.7
|
||||
- Réinstallation des paquets standards
|
||||
|
||||
Parfait !
|
||||
```
|
||||
opnsense-bootstrap
|
||||
```
|
||||
```
|
||||
This utility will attempt to turn this installation into the latest OPNsense 25.7 release. All packages will be deleted, the base system and kernel will be replaced, and if all went well the system will automatically reboot. Proceed with this action? [y/N]:
|
||||
```
|
||||
|
||||
J’ai dit `y`. Ça commencé bien, puis… plus rien. Plus de signal. Plus d’Internet. Je croyais que ce bootstrap allait me sauver. En fait, il m’a enterré.
|
||||
|
||||
🙈 Oups.
|
||||
|
||||
Après un moment, j'ai tenté de le redémarré, mais impossible de me reconnecter en SSH. Pas le choix, j'ai du sortir le routeur du rack, le poser sur mon bureau, brancher écran et clavier et voir ce qui se passait.
|
||||
|
||||
## Repartir de zéro
|
||||
|
||||
C’était mauvais signe :
|
||||
```
|
||||
Fatal error: Uncaught Error: Class "OPNsense\Core\Config" not found
|
||||
in /usr/local/etc/inc/config.inc:143
|
||||
```
|
||||
|
||||
Et les logs du bootstrap étaient pires :
|
||||
```
|
||||
bad dir ino … mangled entry
|
||||
Input/output error
|
||||
```
|
||||
|
||||
Le disque était pas en forme. Je ne pouvais plus rien sauver. Il était temps de repartir de zéro. Heureusement, j’avais une sauvegarde… non ?
|
||||
|
||||
J’ai téléchargé l’ISO OPNsense 25.7, créé une clé USB bootable, et réinstallé par-dessus, en laissant les paramètres par défaut.
|
||||
|
||||
## Le sauveur : `config.xml`
|
||||
|
||||
OPNsense garde toute sa configuration dans un seul fichier : `/conf/config.xml`. Ce fichier a été ma bouée de sauvetage.
|
||||
|
||||
J'ai copié le `config.xml` sauvegardé avant dans ma clé USB. Quand je l'ai connectée sur la machine nouvellement installée, j'ai remplacé le fichier :
|
||||
```bash
|
||||
mount -t msdosfs /dev/da0s1 /mnt
|
||||
cp /mnt/config.xml /conf/config.xml
|
||||
```
|
||||
|
||||
J’ai remis le routeur dans le rack, croisé les doigts… *bip !* 🎉
|
||||
|
||||
Le DHCP m’a donné une adresse, bon signe. Je pouvais accéder à l’interface web, super. Ma configuration était là, à peu près tout sauf les plugins, comme prévu. Je ne peux pas les installer immédiatement, car ils nécessitent une autre mise à jour. Mettons à jour !
|
||||
|
||||
Ce fichier XML à lui seul m'a permis de reconstruire mon routeur sans perdre la raison.
|
||||
|
||||
Sans DNS (AdGuard non installé), j’ai temporairement pointé le DNS pour le système vers `1.1.1.1`.
|
||||
|
||||
## Le Dernier Souffle
|
||||
|
||||
Lors de la mise à jour suivante, rebelote : erreurs, reboot, crash. La machine de nouveau plus accessible...
|
||||
|
||||
Je pouvais officiellement déclarer mon disque NVMe mort.
|
||||
|
||||
🪦 Repose en paix, merci pour tes loyaux services.
|
||||
|
||||
Par chance, j’avais un NVMe Kingston 512 Go encore neuf, livré avec cette machine. Je ne l'avais jamais utilisé car j'avais préféré réutiliser celui à l'intérieur de mon serveur *Vertex*.
|
||||
|
||||
J’ai refait l’installation d'OPNsense dessus, et cette fois tout a fonctionné : passage en 25.7.1 et réinstallation des plugins officiels que j'utilisais.
|
||||
|
||||
Pour les plugins custom (AdGuard Home et UniFi), il a fallu ajouter le repo tiers dans `/usr/local/etc/pkg/repos/mimugmail.conf` (documentation [ici](https://www.routerperformance.net/opnsense-repo/))
|
||||
```json
|
||||
mimugmail: {
|
||||
url: "https://opn-repo.routerperformance.net/repo/${ABI}",
|
||||
priority: 5,
|
||||
enabled: yes
|
||||
}
|
||||
```
|
||||
|
||||
Après un dernier reboot, le routeur était presque prêt, mais je n'avais toujours pas de DNS. C'était à cause de AdGuard Home qui n'était pas configuré
|
||||
|
||||
⚠️ La configuration des plugins tiers ne sont pas sauvegardés dans `config.xml`.
|
||||
|
||||
Reconfigurer AdGuard Home n'était pas bien compliqué, finalement mon DNS fonctionne et t out était revenu à la normale… sauf le contrôleur UniFi.
|
||||
|
||||
## Leçons Apprises à la Dure
|
||||
|
||||
- **Les sauvegardes comptent** : Je me retrouve toujours à penser que les sauvegardes ne sont pas fondamentales... jusqu'à ce qu'on ait besoin de restaurer et qu'il est trop tard.
|
||||
- **Gardez-les sauvegardes hors de la machine** : j’ai eu de la chance de récupérer le `config.xml` avant que mon disque me lâche. J'aurais vraiment passer un mauvais moment à tout restaurer entièrement.
|
||||
- **Vérifier la santé après un crash** : ne pas ignorer un kernel panic.
|
||||
- **Erreurs I/O = alerte rouge** : j’ai perdu des heures à batailler avec un disque condamné.
|
||||
- **Les plugins non-officiels ne sont pas sauvegardés** : La configuration d'OPNsense et de ces plugins officiels sont sauvegardés, ce n'est pas le cas pour les autres.
|
||||
- **Mon routeur est un SPOF** (*Un point de défaillance unique*) : Dans mon homelab, je voulais avoir le maximum d'éléments hautement disponible, il me faut trouver une meilleure solution.
|
||||
|
||||
## Aller de l’Avant
|
||||
|
||||
Je dois sérieusement repenser ma stratégie de sauvegarde. J’ai toujours repoussé, jusqu’à ce qu’il soit trop tard. Ça faisait longtemps que je n’avais pas subi une panne matérielle. Quand ça arrive, ça pique.
|
||||
|
||||
Au départ, je pensais qu’un routeur sur son propre hardware était plus sûr. J’avais tort. Je réfléchis à une virtualisation sous Proxmox pour l’avoir en haute dispo. Un beau projet en perspective !
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
Mon routeur OPNsense est passé d’un simple redémarrage aléatoire à un disque mort, avec un vrai rollercoaster de dépannage. Au final, je suis presque content que ça soit arrivé : j’ai appris bien plus qu’avec une mise à jour sans accroc.
|
||||
|
||||
Si vous utilisez OPNsense (ou n’importe quel routeur), retenez ça :
|
||||
**Gardez une sauvegarde hors de la machine.**
|
||||
|
||||
Parce que quand ça casse — et ça finira par casser — c’est ce petit fichier XML qui peut sauver tout votre homelab.
|
||||
|
||||
Faites des sauvegardes. Restez safe.
|
@@ -2,34 +2,36 @@
|
||||
slug:
|
||||
title: Template
|
||||
description:
|
||||
date:
|
||||
date: 2025-08-22
|
||||
draft: true
|
||||
tags:
|
||||
- opnsense
|
||||
categories:
|
||||
- homelab
|
||||
---
|
||||
## Intro
|
||||
|
||||
This week, I experienced my first real problem on my homelab, which caused my whole home network to go down.
|
||||
|
||||
My OPNsense router crashed and after trying to recover , I finally had to reinstall it from scratch and restore almost all the configuration, thanks to a single XML file. In that story, I will tell you what happened, what I did to recover and what I shouldn't have done.
|
||||
My OPNsense router crashed and, after several failed recovery attempts, I finally had to reinstall it from scratch. Luckily, almost all of the configuration came back thanks to a single XML file. In that story, I will tell you what happened, what I did to recover and what I shouldn't have done.
|
||||
|
||||
This kind of exercise is the worst thing you want to happen because it's never funny to have everything go boom, but this is, by far, the best way to learn.
|
||||
This kind of exercise is the worst thing you want to happen because it's never funny to have everything go boom, but this is by far the best way to learn.
|
||||
|
||||
## The Calm Before the Storm
|
||||
|
||||
My OPNsense box had been running smoothly for months. Router, firewall, DNS, DHCP, VLANs, VPN, reverse proxy and even UniFi controller, all the pieces of my homelab network ran through it, but not only, it is also serving internet at home.
|
||||
My OPNsense box had been running smoothly for months. Router, firewall, DNS, DHCP, VLANs, VPN, reverse proxy and even UniFi controller: all the pieces of my homelab run through it. but not only, it is also serving internet at home.
|
||||
|
||||
This is the heart of my network, I barely can't do anything without it now. I have detailed how this is working in my [Homelab]({{< ref "page/homelab" >}}) section. It was “just working,” and I wasn’t worried about it. I felt confident, its backup was living only inside the machine...
|
||||
This box is the heart of my network, without it, I can hardly do anything. I have detailed how this is working in my [Homelab]({{< ref "page/homelab" >}}) section. It was “just working,” and I wasn’t worried about it. I felt confident, its backup was living only inside the machine...
|
||||
|
||||
Maybe too confident.
|
||||
|
||||
## The Unexpected Reboot
|
||||
|
||||
Out of nowhere, the box rebooted by itself just before midnight. By chance, I was just passing by my rack on my way to bed. I knew the box rebooted because I heard its little beep it is doing when the machine start.
|
||||
Out of nowhere, the box rebooted by itself just before midnight. By chance, I was just passing by my rack on my way to bed. I knew it had rebooted because I heard its little startup beep.
|
||||
|
||||
I wondered why the router restarted without my will. In my bed, I quickly checked if internet was working, and it was. But none of my services were available, my home automation or even this blog. I was tired, I would fix that the next day...
|
||||
|
||||
In the morning, looking at the logs, I found the culprit
|
||||
In the morning, looking at the logs, I found the culprit:
|
||||
```
|
||||
panic: double fault
|
||||
```
|
||||
@@ -48,7 +50,7 @@ caching certificate: decoding certificate metadata: unexpected end of JSON input
|
||||
|
||||
It turned out that one of the cached certificates had been corrupted during the crash. Deleting its cache folder fixed Caddy, and suddenly all my HTTPS services were back online.
|
||||
|
||||
I thought I had dodged the bullet. I didn't investigate much on the root cause analysis, the kernel logs were polluted by one of the interfaces flapping, I thought it was just a bug, instead, I checked for any updates, my first mistake.
|
||||
I thought I had dodged the bullet. I didn't investigate much on the root cause analysis: the kernel logs were polluted by one of the interfaces flapping, I thought it was just a bug. Instead, I went ahead and checked for updates, my first mistake.
|
||||
|
||||
My OPNsense instance was in version 25.1, and the newer 25.7 was available. Let's upgrade it, yay!
|
||||
|
||||
@@ -64,7 +66,7 @@ Clicking the `Download configuration` button, I downloaded the current `config.x
|
||||
|
||||
## Filesystem Corruption
|
||||
|
||||
I decided to recover the pkg database the worst possible way, I backed up the `/var/db/pkg` folder and I tried to `bootstrap` it.
|
||||
I decided to recover the pkg database the worst possible way, I backed up the `/var/db/pkg` folder and I tried to `bootstrap` it:
|
||||
```bash
|
||||
cp -a /var/db/pkg /var/db/pkg.bak
|
||||
pkg bootstrap -f
|
||||
@@ -73,17 +75,10 @@ pkg bootstrap -f
|
||||
The package management tool is not yet installed on your system.
|
||||
Do you want to fetch and install it now? [y/N]: y
|
||||
Bootstrapping pkg from https://pkg.opnsense.org/FreeBSD:14:amd64/25.7/latest, please wait...
|
||||
Verifying signature with trusted certificate pkg.opnsense.org.20250710... done
|
||||
Installing pkg-1.19.2_5...
|
||||
Extracting pkg-1.19.2_5: 13%
|
||||
[...]
|
||||
pkg-static: Fail to extract /usr/local/lib/libpkg.a from package: Write error
|
||||
Extracting pkg-1.19.2_5: 100%
|
||||
|
||||
Failed to install the following 1 package(s): /tmp//pkg.pkg.scQnQs
|
||||
Bootstrapping pkg from https://opn-repo.routerperformance.net/repo/FreeBSD:14:amd64, please wait...
|
||||
pkg: Attempted to fetch https://opn-repo.routerperformance.net/repo/FreeBSD:14:amd64/Latest/pkg.pkg
|
||||
pkg: Attempted to fetch https://opn-repo.routerperformance.net/repo/FreeBSD:14:amd64/Latest/pkg.txz
|
||||
pkg: Error: Not Found
|
||||
[...]
|
||||
A pre-built version of pkg could not be found for your system.
|
||||
```
|
||||
|
||||
@@ -123,6 +118,8 @@ I discovered the utility `opnsense-bootstrap`, which promises to reinstall all p
|
||||
- Remove all installed packages.
|
||||
- Fresh 25.7 base system and kernel will be downloaded and installed.
|
||||
- All standard OPNsense packages will be reinstalled.
|
||||
|
||||
Wonderful!
|
||||
```
|
||||
opnsense-bootstrap
|
||||
```
|
||||
@@ -130,7 +127,7 @@ opnsense-bootstrap
|
||||
This utility will attempt to turn this installation into the latest OPNsense 25.7 release. All packages will be deleted, the base system and kernel will be replaced, and if all went well the system will automatically reboot. Proceed with this action? [y/N]:
|
||||
```
|
||||
|
||||
I pressed `y`. This started well, but then... no more signal -> no more internet.
|
||||
I pressed `y`. This started well, but then... no more signal -> no more internet. I thought this bootstrap would save me. Instead, it buried me.
|
||||
|
||||
🙈 Oops.
|
||||
|
||||
@@ -150,7 +147,7 @@ bad dir ino … mangled entry
|
||||
Input/output error
|
||||
```
|
||||
|
||||
The disk is in a bad shape, I can't do anything more for that instance, I'd better start from scratch now, I have backup, haven't it? (lol)
|
||||
The disk is in a bad shape, at this point, I couldn’t save the install anymore. Time to start from scratch. Luckily, I had a backup… right?
|
||||
|
||||
I downloaded the latest OPNsense ISO (v25.7) and put it into a USB stick. I reinstall OPNsense and overwrite the current installation, I kept everything as default.
|
||||
|
||||
@@ -158,15 +155,17 @@ I downloaded the latest OPNsense ISO (v25.7) and put it into a USB stick. I rein
|
||||
|
||||
OPNsense keeps the whole configuration in a single file: `/conf/config.xml`. That file was my lifeline.
|
||||
|
||||
I copied the `config.xml`file saved earlier into the USB stick. When plugged into the fresh OPNsense box, I overwrite the file with this one:
|
||||
I copied the `config.xml`file saved earlier into the USB stick. When plugged into the fresh OPNsense box, I overwrite the file:
|
||||
```bash
|
||||
mount -t msdosfs /dev/da0s1 /mnt
|
||||
cp /mnt/config.xml /conf/config.xml
|
||||
```
|
||||
|
||||
I placed the router back in the rack, powered it on and crossed my fingers. beep!
|
||||
I placed the router back in the rack, powered it on and crossed my fingers... *beep!* 🎉
|
||||
|
||||
The DHCP gave me an address, good start. I could reach its URL, awesome. My configuration is here, almost everything, but the plugins. I can't install them right away because they need another update, let's update it!
|
||||
The DHCP gave me an address, good start. I could reach its URL, awesome. My configuration is here, almost everything but the plugins, as expected. I can't install them right away because they need another update, let's update it!
|
||||
|
||||
This single XML file is the reason I could rebuild my router without losing my sanity
|
||||
|
||||
DNS is KO because the AdGuard Home plugin is not installed, I temporary set the system DNS to `1.1.1.1`
|
||||
|
||||
@@ -176,13 +175,13 @@ During that upgrade, the system threw errors again… and then rebooted itself.
|
||||
|
||||
I can officially say that my NVMe drive is dead.
|
||||
|
||||
🪦 Rest in peace.
|
||||
🪦 Rest in peace, thank you for your great services.
|
||||
|
||||
By chance, I have an unused NVMe Kingston drive of 512GB which was deliver with that box. I never used it because I preferred to use the one I was using before in my Vertex server.
|
||||
Luckily, I had a spare 512GB Kingston NVMe that came with that box. I never used it because I preferred to reuse the one inside my *Vertex* server.
|
||||
|
||||
I redo the same steps to reinstall OPNsense on that disk. I could finally update OPNsense to 25.7.1 and reinstall all the official plugins that I was using.
|
||||
I redo the same steps to reinstall OPNsense on that disk and this time everything worked: I could finally update OPNsense to 25.7.1 and reinstall all the official plugins that I was using.
|
||||
|
||||
To install custom plugins (AdGuard Home and Unifi), I had to add the custom repository `/usr/local/etc/pkg/repos/mimugmail.conf` (documentation [here](https://www.routerperformance.net/opnsense-repo/))
|
||||
To install custom plugins (AdGuard Home and Unifi), I had to add the custom repository in `/usr/local/etc/pkg/repos/mimugmail.conf` (documentation [here](https://www.routerperformance.net/opnsense-repo/))
|
||||
```json
|
||||
mimugmail: {
|
||||
url: "https://opn-repo.routerperformance.net/repo/${ABI}",
|
||||
@@ -193,45 +192,32 @@ mimugmail: {
|
||||
|
||||
After a final reboot, the router is almost ready, but I still don't have DNS services. This is because AdGuard Home is not configured.
|
||||
|
||||
⚠️ Custom plugin configuration is not saved within the standard backup in `config.xml`, which makes sense. As this is the only file I saved, I don't have any backup configuration for these plugins.
|
||||
⚠️ Custom plugin configuration is not saved within the backup in `config.xml`.
|
||||
|
||||
Reconfigure AdGuard Home is pretty straight forward, finally my DNS is working and everything is back to nominal, except the UniFi controller/
|
||||
Reconfigure AdGuard Home is pretty straight forward, finally my DNS is working and everything is back to nominal... except the UniFi controller.
|
||||
|
||||
## Lessons Learned the Hard Way
|
||||
|
||||
OPNsense Backups
|
||||
|
||||
After a crash, healthcheck
|
||||
|
||||
- **Don’t reuse old hardware for critical services.** That NVMe was living on borrowed time.
|
||||
|
||||
- **Always trust but verify storage.** Run `smartctl`, run `fsck`, and don’t ignore write errors.
|
||||
|
||||
- **`config.xml` is the crown jewel.** With it, a full reinstall is almost painless. Without it, I would have been rebuilding from scratch.
|
||||
|
||||
- **Custom plugin configs are not in config.xml.** If you rely on AdGuard, UniFi, etc., back them up separately.
|
||||
|
||||
- **Know when to stop repairing.** I wasted hours trying to nurse a dead disk. Installing on new hardware fixed everything in minutes.
|
||||
|
||||
|
||||
What I did wrong (and why it hurt)
|
||||
|
||||
What I should have done differently
|
||||
|
||||
The single most important file in OPNsense
|
||||
|
||||
Why keeping off-box backups matters
|
||||
- **Backups matter**: I always found myself thinking backups are not relevant... until you need to restore and it's too late.
|
||||
- **Keep backups off the box**: I was lucky to get the `config.xml` before my disk die, I would have a really hard time to fully recover.
|
||||
- **Healthcheck after a crash**: Do not ignore a kernel panic.
|
||||
- **I/O errors = red flag**: I should have stopped trying to repair. I lost hours fighting a dead disk.
|
||||
- **Custom plugin configs aren’t include**d: OPNsense configuration and its official plugin are saved into the backups, this is not the case for the others.
|
||||
- **My router is a SPOF** (*single point of failure*): In my homelab, I wanted to have most of my elements highly available, I need to find a better solution.
|
||||
|
||||
## Moving Forward
|
||||
|
||||
My new backup strategy
|
||||
I really need to think on my backup strategy. I'm too lazy and always keep it for later, until it is too late. It's been a long time since I've been struck by a hardware failure. When it strikes, it hurts.
|
||||
|
||||
Plans to improve reliability in my homelab
|
||||
|
||||
Final thoughts: sometimes starting fresh is the cleanest fix
|
||||
Initially I wanted my router to be in its own hardware because I thought it was safe, I was damn wrong. I will think on a solution to virtualize OPNsense in Proxmox to have it highly available, a great project in perspective!
|
||||
|
||||
## Conclusion
|
||||
|
||||
How this failure taught me more than a normal upgrade ever could
|
||||
My OPNsense router went from a random reboot to a dead disk, with a rollercoaster of troubleshooting. In the end, I'm happy with the result, it taught me more than any smooth upgrade ever could.
|
||||
|
||||
Encouragement for others to prepare before disaster strikes
|
||||
If you run OPNsense (or any router), remember this:
|
||||
**Keep a backup off the box.**
|
||||
|
||||
Because when things go wrong, and eventually they will, that backup can save your homelab. Thanks to that one little XML file, what could have been a complete disaster ended as just a painful, but very educational, weekend.
|
||||
|
||||
Stay safe, make backups.
|
@@ -13,4 +13,4 @@ I'm ==testing==
|
||||
|
||||
## Emoji
|
||||
|
||||
🚀💡🔧🔁⚙️📝📌✅⚠️🍒❌ℹ️⌛🚨
|
||||
🚀💡🔧🔁⚙️📝📌✅⚠️🍒❌ℹ️⌛🚨🎉
|
Reference in New Issue
Block a user