Tailscale Throw Route Lost After needrestart Restarted networkd, 2026-05-02¶

Summary¶

All Incus containers became unreachable from tailnet peers after unattended-upgrades upgraded kmod and needrestart restarted systemd-networkd. The networkd restart wiped routing table 52, including the throw 10.10.10.0/24 entry that Tailscale uses to bypass the exit node for the Incus subnet. Tailscale detected the deletion but failed to re-add the route, leaving a bare default dev tailscale0 in table 52. Packets destined for 10.10.10.x containers were silently routed through the tsunagaru exit node instead of forwarded locally, causing HAProxy on tsunagaru to return 503 for all proxied services. The fix was systemctl restart tailscaled.

This incident shares its root mechanism with the 2026-04-10 incident. The ManageForeignRoutingPolicyRules=no fix applied after that incident protects Tailscale's ip rules (priorities 5210–5270) but does not protect routing table entries inside table 52 itself.

Situation¶

The Incus server acts as a Tailscale subnet router advertising 10.10.10.0/24 (all Incus containers) and several 192.168.1.x physical LAN hosts into the tailnet.

The Incus server also uses the Sakura VPS (tsunagaru) as a Tailscale exit node for all outbound internet traffic. When an exit node is active, Tailscale installs:

default dev tailscale0 in table 52, catching all outbound traffic and routing it to the exit node
throw <subnet> in table 52 for each advertised subnet, making lookups for those destinations fall through to the main routing table so packets reach local interfaces instead

Without the throw 10.10.10.0/24 entry, any packet destined for a container is caught by the default entry and exits through tailscale0 toward tsunagaru, which has no path to 10.10.10.x and drops it. This includes forwarded traffic from HAProxy on tsunagaru, which proxies services via the Tailscale subnet route.

Timeline (JST)¶

The Incus server runs in UTC. All times below are JST (UTC+9). Server log times are exact; investigation times are approximate.

Time (JST)	Event
2026-05-02 15:36:54	`apt-daily-upgrade.service` runs; `unattended-upgrades` upgrades `kmod`
2026-05-02 15:37:15	`needrestart` restarts `systemd-networkd`. networkd flushes routing table 52. `tailscaled` logs `monitor: RTM_DELROUTE: src=, dst=10.10.10.0/24, gw=, outif=0, table=52`
2026-05-02 15:37:16	`tailscaled` logs "allowing exit node access to local IPs: [10.10.10.0/24 ...]" but fails to re-add the throw route: networkd is still mid-teardown when Tailscale tries to write back
2026-05-02 15:40:31	Uptime Kuma detects 503 on all proxied services
2026-05-02 ~19:00	Incident noticed while traveling; investigation begins from hotel. `ip route get 10.10.10.196` confirms routing via `tailscale0` table 52 instead of `incusbr0`. `ip route show table 52` shows `throw 10.10.10.0/24` missing
2026-05-02 ~19:05	`journalctl -u tailscaled` reviewed; the 15:37 JST networkd flush and Tailscale's failed recovery confirmed
2026-05-02 19:07:47	`systemctl restart tailscaled`. `throw 10.10.10.0/24` restored
2026-05-02 19:08:31	Uptime Kuma confirms 200 OK
2026-05-02 ~19:10	`/etc/needrestart/conf.d/incus.conf` updated to blacklist `systemd-networkd` and `tailscaled` from auto-restart

Total downtime: 3 hours 28 minutes (2026-05-02 15:40:31 to 19:08:31 JST).

Root Cause¶

unattended-upgrades upgraded the kmod package, which is a purely userspace tool (provides modprobe, depmod, etc.). Despite no kernel module being loaded or unloaded, needrestart detected the upgrade and restarted systemd-networkd.

When systemd-networkd restarts, it performs a full interface teardown and rebuild. As part of that teardown it flushes all routes it considers unmanaged, including entries in table 52 that Tailscale installed. This fires a stream of RTM_DELROUTE netlink events.

Tailscale's netlink monitor detected the deletion of throw 10.10.10.0/24 from table 52 and its recovery path logged "allowing exit node access to local IPs". However, setRoutes() was called while networkd was still mid-teardown and flushing. Either the ip route add call failed silently, or networkd flushed the route again milliseconds after Tailscale re-added it. By the time networkd stabilized, Tailscale had already finished its recovery attempt and was no longer watching.

The ManageForeignRoutingPolicyRules=no setting applied after the 2026-04-10 incident prevents networkd from flushing Tailscale's ip policy rules (table 52 rule, priorities 5210–5270). It does not prevent networkd from flushing routes inside table 52 itself. The ip rules survived this incident; the table 52 routes did not.

The direct trigger, needrestart restarting systemd-networkd after a kmod upgrade, is unnecessary. kmod is not a daemon and does not need a service restart to take effect.

Impact¶

All Incus containers on 10.10.10.0/24 became unreachable from any tailnet peer, including tsunagaru
HAProxy on tsunagaru returned 503 for all proxied services (Forgejo, Mastodon, Miniflux, etc.) for the duration of the outage, since it reaches containers via the Tailscale subnet route
The Incus server itself could not reach its own containers: ip route get 10.10.10.x returned dev tailscale0 table 52 instead of dev incusbr0
Physical LAN hosts advertised as 192.168.1.x were unaffected: their throw routes were also deleted but packets for those hosts reach them via the LAN interface regardless

What Went Well¶

Uptime Kuma caught the outage within 3 minutes of the route deletion
ip route get 10.10.10.196 immediately pinpointed the wrong routing path
ip route show table 52 made the missing throw entry obvious
journalctl -u tailscaled gave a precise timestamp and confirmed the networkd flush as the trigger, enabling full timeline reconstruction from logs alone
systemctl restart tailscaled was sufficient to recover; no manual route surgery required

What Did Not Go Well¶

Response time was 3.5 hours: Uptime Kuma caught the outage at 15:40 but the alert was not acted on until ~19:00, when investigation began from a hotel while traveling
The 2026-04-10 action items did not prevent recurrence. ManageForeignRoutingPolicyRules=no only covers ip rules, not table 52 routes, so the class of bug was not fully closed
needrestart restarting systemd-networkd after a kmod upgrade is unnecessary and causes cascading damage to unrelated services. This is a needrestart configuration gap, not a kernel requirement

Lessons Learned¶

ManageForeignRoutingPolicyRules=no in networkd.conf protects ip policy rules but not routing table entries. Tailscale's table 52 throw routes remain vulnerable to any networkd restart
Tailscale's recovery path for RTM_DELROUTE events has a race condition with networkd mid-restart. The only reliable recovery is a full systemctl restart tailscaled once networkd has stabilized
needrestart will restart systemd-networkd (and potentially tailscaled) when packages like kmod are upgraded, even when no restart is necessary. Services that own kernel routing state should be in a needrestart blacklist
Tailnet subnet route health should be monitored from the tailnet consumer side, not just from the advertising side. tailscale status showing a route as "advertised and approved" does not mean packets are actually being forwarded correctly

Action Items¶

Restart tailscaled to restore the missing throw route in table 52
Add systemd-networkd and tailscaled to the needrestart blacklist in /etc/needrestart/conf.d/incus.conf to prevent auto-restart of either service during unattended upgrades (see Incus Server: needrestart Configuration)
File a Tailscale issue describing the race condition in the RTM_DELROUTE recovery path that leaves table 52 routes unrestored when networkd is still tearing down

post-mortem self-hosting networking tailscale systemd incus