nftables Stale Interface Index, 2026-03-26¶
Summary¶
The Incus server (home bare-metal server running all self-hosted services) lost internet connectivity. It uses the Sakura VPS as a Tailscale exit node, and the VPS stopped forwarding traffic due to stale nftables rules referencing a defunct interface index for tailscale0 after tailscaled restarted.
Situation¶
The Sakura VPS (tsunagaru) is the public-facing reverse proxy for all self-hosted services. It runs HAProxy, Tailscale, and a custom nftables firewall via nftables-local.service. It also acts as a Tailscale exit node.
The Incus server (home bare-metal server) uses tsunagaru as its Tailscale exit node for all outbound internet traffic.
Traffic flow: Incus server sends traffic via Tailscale to tsunagaru, which should forward it from tailscale0 out ens3 to the internet.
Timeline (JST)¶
| Time | Event |
|---|---|
| 06:47 | apt-daily-upgrade.service runs, unattended-upgrades upgrades the Tailscale package |
| 06:48 | systemd restarts tailscaled as part of the upgrade. tailscale0 is recreated with a new interface index. nftables-local.service does not reload. Forwarding breaks |
| 06:48 | Uptime Kuma detects outage on the Incus server: "timeout of 3000ms exceeded" |
| 08:00 | updown.io SMS alert: Incus Pulse dead man's switch triggered, more than 2 hours since last pulse |
| 08:40 | updown.io SMS alert: uptime-kuma.benoit.jp.net returning 503 Service Unavailable |
| 09:13 | Manual restart of tailscaled attempted as a fix, but this only recreates tailscale0 with yet another index, same stale nftables problem persists |
| ~11:10 | Investigation with nft list ruleset reveals stale iif 5 / oif 5 rules. Root cause identified |
| 11:20 | nftables-local.service restarted, rules reloaded with the correct interface index |
| 11:21 | Uptime Kuma confirms full recovery: "200 - OK" |
| 12:00 | updown.io SMS: Incus Pulse recovered after ~4 hours of downtime |
Total downtime: ~4 hours 33 minutes.
Root Cause¶
apt-daily-upgrade.service ran at 06:47 and unattended-upgrades upgraded the Tailscale package. systemd restarted tailscaled as part of the upgrade, which recreated the tailscale0 TUN device with a new kernel interface index.
The nftables-local.sh script references tailscale0 by name, but nftables resolves interface names to their kernel index at load time. Since nftables-local.service was Type=oneshot with only Wants=tailscaled.service, it did not re-run after the tailscaled restart. The inet filter forward chain still held rules matching the old index, which no longer corresponded to any interface. With a policy drop on the forward chain, all forwarded packets were silently dropped.
A manual restart of tailscaled at 09:13 was attempted as a fix, but this only recreated tailscale0 with yet another index, leaving the same stale nftables problem in place.
Impact¶
- All Tailscale exit node traffic forwarded through the Sakura VPS was dropped
- The entire Incus server lost outbound internet connectivity, as all traffic routes through
tsunagaruas the Tailscale exit node - Public-facing services were affected: HAProxy resolves backends by Tailscale DNS name (e.g.
forgejo.incus,mastodon2.incus), and DNS resolution depends on forwarding throughtailscale0, which was blocked by the stale rules
What Went Well¶
- The monitoring stack worked as designed: Uptime Kuma caught the outage, updown.io's Pulse dead man's switch detected the Incus server going silent, and SMS alerts arrived on the phone as the last-resort notification channel
tcpdumpquickly confirmed packets arrived ontailscale0but were never forwarded toens3nft list rulesetrevealed the staleiif 5/oif 5rules, pointing directly at the root cause
What Did Not Go Well¶
unattended-upgradestriggered atailscaledrestart without any mechanism to reload dependent services likenftables-local- The initial troubleshooting attempt (restarting
tailscaled) did not help, as it only created yet another new interface index with the same stale nftables problem - The
nftables-local.serviceunit had no dependency ensuring it would reload whentailscaledrestarted - The script did not flush the table before adding rules, risking duplicates on manual restarts
Lessons Learned¶
- nftables resolves interface names to kernel indexes at rule load time; if the interface is recreated, rules go stale
Wants=only ensures a unit starts, it does not bind lifecycle (restart/stop) to the dependencyBindsTo=is the correct directive when a service must follow another service's lifecycle
Action Items¶
- Restart
nftables-local.serviceto reload rules with the current interface index - Add
nft flush table inet filterto the script before adding rules, preventing duplicates on restarts - Change
Wants=tailscaled.servicetoBindsTo=tailscaled.serviceso the firewall rules reload whenevertailscaledrestarts - Update Sakura VPS documentation to reflect both fixes