nftables Stale Interface Index, 2026-03-26¶

Summary¶

The Incus server (home bare-metal server running all self-hosted services) lost internet connectivity. It uses the Sakura VPS as a Tailscale exit node, and the VPS stopped forwarding traffic due to stale nftables rules referencing a defunct interface index for tailscale0 after tailscaled restarted.

Situation¶

The Sakura VPS (tsunagaru) is the public-facing reverse proxy for all self-hosted services. It runs HAProxy, Tailscale, and a custom nftables firewall via nftables-local.service. It also acts as a Tailscale exit node.

The Incus server (home bare-metal server) uses tsunagaru as its Tailscale exit node for all outbound internet traffic.

Traffic flow: Incus server sends traffic via Tailscale to tsunagaru, which should forward it from tailscale0 out ens3 to the internet.

Timeline (JST)¶

Time	Event
06:47	`apt-daily-upgrade.service` runs, `unattended-upgrades` upgrades the Tailscale package
06:48	systemd restarts `tailscaled` as part of the upgrade. `tailscale0` is recreated with a new interface index. `nftables-local.service` does not reload. Forwarding breaks
06:48	Uptime Kuma detects outage on the Incus server: "timeout of 3000ms exceeded"
08:00	updown.io SMS alert: Incus Pulse dead man's switch triggered, more than 2 hours since last pulse
08:40	updown.io SMS alert: `uptime-kuma.benoit.jp.net` returning 503 Service Unavailable
09:13	Manual restart of `tailscaled` attempted as a fix, but this only recreates `tailscale0` with yet another index, same stale nftables problem persists
~11:10	Investigation with `nft list ruleset` reveals stale `iif 5` / `oif 5` rules. Root cause identified
11:20	`nftables-local.service` restarted, rules reloaded with the correct interface index
11:21	Uptime Kuma confirms full recovery: "200 - OK"
12:00	updown.io SMS: Incus Pulse recovered after ~4 hours of downtime

Total downtime: ~4 hours 33 minutes.

Root Cause¶

apt-daily-upgrade.service ran at 06:47 and unattended-upgrades upgraded the Tailscale package. systemd restarted tailscaled as part of the upgrade, which recreated the tailscale0 TUN device with a new kernel interface index.

The nftables-local.sh script references tailscale0 by name, but nftables resolves interface names to their kernel index at load time. Since nftables-local.service was Type=oneshot with only Wants=tailscaled.service, it did not re-run after the tailscaled restart. The inet filter forward chain still held rules matching the old index, which no longer corresponded to any interface. With a policy drop on the forward chain, all forwarded packets were silently dropped.

A manual restart of tailscaled at 09:13 was attempted as a fix, but this only recreated tailscale0 with yet another index, leaving the same stale nftables problem in place.

Impact¶

All Tailscale exit node traffic forwarded through the Sakura VPS was dropped
The entire Incus server lost outbound internet connectivity, as all traffic routes through tsunagaru as the Tailscale exit node
Public-facing services were affected: HAProxy resolves backends by Tailscale DNS name (e.g. forgejo.incus, mastodon2.incus), and DNS resolution depends on forwarding through tailscale0, which was blocked by the stale rules

What Went Well¶

The monitoring stack worked as designed: Uptime Kuma caught the outage, updown.io's Pulse dead man's switch detected the Incus server going silent, and SMS alerts arrived on the phone as the last-resort notification channel
tcpdump quickly confirmed packets arrived on tailscale0 but were never forwarded to ens3
nft list ruleset revealed the stale iif 5 / oif 5 rules, pointing directly at the root cause

What Did Not Go Well¶

unattended-upgrades triggered a tailscaled restart without any mechanism to reload dependent services like nftables-local
The initial troubleshooting attempt (restarting tailscaled) did not help, as it only created yet another new interface index with the same stale nftables problem
The nftables-local.service unit had no dependency ensuring it would reload when tailscaled restarted
The script did not flush the table before adding rules, risking duplicates on manual restarts

Lessons Learned¶

nftables resolves interface names to kernel indexes at rule load time; if the interface is recreated, rules go stale
Wants= only ensures a unit starts, it does not bind lifecycle (restart/stop) to the dependency
BindsTo= is the correct directive when a service must follow another service's lifecycle

Action Items¶

Restart nftables-local.service to reload rules with the current interface index
Add nft flush table inet filter to the script before adding rules, preventing duplicates on restarts
Change Wants=tailscaled.service to BindsTo=tailscaled.service so the firewall rules reload whenever tailscaled restarts
Update Sakura VPS documentation to reflect both fixes