Skip to content

nftables Stale Interface Index, 2026-03-26

Summary

The Incus server (home bare-metal server running all self-hosted services) lost internet connectivity. It uses the Sakura VPS as a Tailscale exit node, and the VPS stopped forwarding traffic due to stale nftables rules referencing a defunct interface index for tailscale0 after tailscaled restarted.

Situation

The Sakura VPS (tsunagaru) is the public-facing reverse proxy for all self-hosted services. It runs HAProxy, Tailscale, and a custom nftables firewall via nftables-local.service. It also acts as a Tailscale exit node.

The Incus server (home bare-metal server) uses tsunagaru as its Tailscale exit node for all outbound internet traffic.

Traffic flow: Incus server sends traffic via Tailscale to tsunagaru, which should forward it from tailscale0 out ens3 to the internet.

Timeline (JST)

Time Event
06:47 apt-daily-upgrade.service runs, unattended-upgrades upgrades the Tailscale package
06:48 systemd restarts tailscaled as part of the upgrade. tailscale0 is recreated with a new interface index. nftables-local.service does not reload. Forwarding breaks
06:48 Uptime Kuma detects outage on the Incus server: "timeout of 3000ms exceeded"
08:00 updown.io SMS alert: Incus Pulse dead man's switch triggered, more than 2 hours since last pulse
08:40 updown.io SMS alert: uptime-kuma.benoit.jp.net returning 503 Service Unavailable
09:13 Manual restart of tailscaled attempted as a fix, but this only recreates tailscale0 with yet another index, same stale nftables problem persists
~11:10 Investigation with nft list ruleset reveals stale iif 5 / oif 5 rules. Root cause identified
11:20 nftables-local.service restarted, rules reloaded with the correct interface index
11:21 Uptime Kuma confirms full recovery: "200 - OK"
12:00 updown.io SMS: Incus Pulse recovered after ~4 hours of downtime

Total downtime: ~4 hours 33 minutes.

Root Cause

apt-daily-upgrade.service ran at 06:47 and unattended-upgrades upgraded the Tailscale package. systemd restarted tailscaled as part of the upgrade, which recreated the tailscale0 TUN device with a new kernel interface index.

The nftables-local.sh script references tailscale0 by name, but nftables resolves interface names to their kernel index at load time. Since nftables-local.service was Type=oneshot with only Wants=tailscaled.service, it did not re-run after the tailscaled restart. The inet filter forward chain still held rules matching the old index, which no longer corresponded to any interface. With a policy drop on the forward chain, all forwarded packets were silently dropped.

A manual restart of tailscaled at 09:13 was attempted as a fix, but this only recreated tailscale0 with yet another index, leaving the same stale nftables problem in place.

Impact

  • All Tailscale exit node traffic forwarded through the Sakura VPS was dropped
  • The entire Incus server lost outbound internet connectivity, as all traffic routes through tsunagaru as the Tailscale exit node
  • Public-facing services were affected: HAProxy resolves backends by Tailscale DNS name (e.g. forgejo.incus, mastodon2.incus), and DNS resolution depends on forwarding through tailscale0, which was blocked by the stale rules

What Went Well

  • The monitoring stack worked as designed: Uptime Kuma caught the outage, updown.io's Pulse dead man's switch detected the Incus server going silent, and SMS alerts arrived on the phone as the last-resort notification channel
  • tcpdump quickly confirmed packets arrived on tailscale0 but were never forwarded to ens3
  • nft list ruleset revealed the stale iif 5 / oif 5 rules, pointing directly at the root cause

What Did Not Go Well

  • unattended-upgrades triggered a tailscaled restart without any mechanism to reload dependent services like nftables-local
  • The initial troubleshooting attempt (restarting tailscaled) did not help, as it only created yet another new interface index with the same stale nftables problem
  • The nftables-local.service unit had no dependency ensuring it would reload when tailscaled restarted
  • The script did not flush the table before adding rules, risking duplicates on manual restarts

Lessons Learned

  • nftables resolves interface names to kernel indexes at rule load time; if the interface is recreated, rules go stale
  • Wants= only ensures a unit starts, it does not bind lifecycle (restart/stop) to the dependency
  • BindsTo= is the correct directive when a service must follow another service's lifecycle

Action Items

  • Restart nftables-local.service to reload rules with the current interface index
  • Add nft flush table inet filter to the script before adding rules, preventing duplicates on restarts
  • Change Wants=tailscaled.service to BindsTo=tailscaled.service so the firewall rules reload whenever tailscaled restarts
  • Update Sakura VPS documentation to reflect both fixes