Tailscale Throw Route Lost After needrestart Restarted networkd, 2026-05-02¶
Summary¶
All Incus containers became unreachable from tailnet peers after unattended-upgrades upgraded kmod and needrestart restarted systemd-networkd. The networkd restart wiped routing table 52, including the throw 10.10.10.0/24 entry that Tailscale uses to bypass the exit node for the Incus subnet. Tailscale detected the deletion but failed to re-add the route, leaving a bare default dev tailscale0 in table 52. Packets destined for 10.10.10.x containers were silently routed through the tsunagaru exit node instead of forwarded locally, causing HAProxy on tsunagaru to return 503 for all proxied services. The fix was systemctl restart tailscaled.
This incident shares its root mechanism with the 2026-04-10 incident. The ManageForeignRoutingPolicyRules=no fix applied after that incident protects Tailscale's ip rules (priorities 5210–5270) but does not protect routing table entries inside table 52 itself.
Situation¶
The Incus server acts as a Tailscale subnet router advertising 10.10.10.0/24 (all Incus containers) and several 192.168.1.x physical LAN hosts into the tailnet.
The Incus server also uses the Sakura VPS (tsunagaru) as a Tailscale exit node for all outbound internet traffic. When an exit node is active, Tailscale installs:
default dev tailscale0in table 52, catching all outbound traffic and routing it to the exit nodethrow <subnet>in table 52 for each advertised subnet, making lookups for those destinations fall through to the main routing table so packets reach local interfaces instead
Without the throw 10.10.10.0/24 entry, any packet destined for a container is caught by the default entry and exits through tailscale0 toward tsunagaru, which has no path to 10.10.10.x and drops it. This includes forwarded traffic from HAProxy on tsunagaru, which proxies services via the Tailscale subnet route.
Timeline (JST)¶
The Incus server runs in UTC. All times below are JST (UTC+9). Server log times are exact; investigation times are approximate.
| Time (JST) | Event |
|---|---|
| 2026-05-02 15:36:54 | apt-daily-upgrade.service runs; unattended-upgrades upgrades kmod |
| 2026-05-02 15:37:15 | needrestart restarts systemd-networkd. networkd flushes routing table 52. tailscaled logs monitor: RTM_DELROUTE: src=, dst=10.10.10.0/24, gw=, outif=0, table=52 |
| 2026-05-02 15:37:16 | tailscaled logs "allowing exit node access to local IPs: [10.10.10.0/24 ...]" but fails to re-add the throw route: networkd is still mid-teardown when Tailscale tries to write back |
| 2026-05-02 15:40:31 | Uptime Kuma detects 503 on all proxied services |
| 2026-05-02 ~19:00 | Incident noticed while traveling; investigation begins from hotel. ip route get 10.10.10.196 confirms routing via tailscale0 table 52 instead of incusbr0. ip route show table 52 shows throw 10.10.10.0/24 missing |
| 2026-05-02 ~19:05 | journalctl -u tailscaled reviewed; the 15:37 JST networkd flush and Tailscale's failed recovery confirmed |
| 2026-05-02 19:07:47 | systemctl restart tailscaled. throw 10.10.10.0/24 restored |
| 2026-05-02 19:08:31 | Uptime Kuma confirms 200 OK |
| 2026-05-02 ~19:10 | /etc/needrestart/conf.d/incus.conf updated to blacklist systemd-networkd and tailscaled from auto-restart |
Total downtime: 3 hours 28 minutes (2026-05-02 15:40:31 to 19:08:31 JST).
Root Cause¶
unattended-upgrades upgraded the kmod package, which is a purely userspace tool (provides modprobe, depmod, etc.). Despite no kernel module being loaded or unloaded, needrestart detected the upgrade and restarted systemd-networkd.
When systemd-networkd restarts, it performs a full interface teardown and rebuild. As part of that teardown it flushes all routes it considers unmanaged, including entries in table 52 that Tailscale installed. This fires a stream of RTM_DELROUTE netlink events.
Tailscale's netlink monitor detected the deletion of throw 10.10.10.0/24 from table 52 and its recovery path logged "allowing exit node access to local IPs". However, setRoutes() was called while networkd was still mid-teardown and flushing. Either the ip route add call failed silently, or networkd flushed the route again milliseconds after Tailscale re-added it. By the time networkd stabilized, Tailscale had already finished its recovery attempt and was no longer watching.
The ManageForeignRoutingPolicyRules=no setting applied after the 2026-04-10 incident prevents networkd from flushing Tailscale's ip policy rules (table 52 rule, priorities 5210–5270). It does not prevent networkd from flushing routes inside table 52 itself. The ip rules survived this incident; the table 52 routes did not.
The direct trigger, needrestart restarting systemd-networkd after a kmod upgrade, is unnecessary. kmod is not a daemon and does not need a service restart to take effect.
Impact¶
- All Incus containers on
10.10.10.0/24became unreachable from any tailnet peer, including tsunagaru - HAProxy on tsunagaru returned 503 for all proxied services (Forgejo, Mastodon, Miniflux, etc.) for the duration of the outage, since it reaches containers via the Tailscale subnet route
- The Incus server itself could not reach its own containers:
ip route get 10.10.10.xreturneddev tailscale0 table 52instead ofdev incusbr0 - Physical LAN hosts advertised as
192.168.1.xwere unaffected: theirthrowroutes were also deleted but packets for those hosts reach them via the LAN interface regardless
What Went Well¶
- Uptime Kuma caught the outage within 3 minutes of the route deletion
ip route get 10.10.10.196immediately pinpointed the wrong routing pathip route show table 52made the missingthrowentry obviousjournalctl -u tailscaledgave a precise timestamp and confirmed the networkd flush as the trigger, enabling full timeline reconstruction from logs alonesystemctl restart tailscaledwas sufficient to recover; no manual route surgery required
What Did Not Go Well¶
- Response time was 3.5 hours: Uptime Kuma caught the outage at 15:40 but the alert was not acted on until ~19:00, when investigation began from a hotel while traveling
- The 2026-04-10 action items did not prevent recurrence.
ManageForeignRoutingPolicyRules=noonly covers ip rules, not table 52 routes, so the class of bug was not fully closed needrestartrestartingsystemd-networkdafter akmodupgrade is unnecessary and causes cascading damage to unrelated services. This is aneedrestartconfiguration gap, not a kernel requirement
Lessons Learned¶
ManageForeignRoutingPolicyRules=noinnetworkd.confprotects ip policy rules but not routing table entries. Tailscale's table 52throwroutes remain vulnerable to any networkd restart- Tailscale's recovery path for
RTM_DELROUTEevents has a race condition with networkd mid-restart. The only reliable recovery is a fullsystemctl restart tailscaledonce networkd has stabilized needrestartwill restartsystemd-networkd(and potentiallytailscaled) when packages likekmodare upgraded, even when no restart is necessary. Services that own kernel routing state should be in a needrestart blacklist- Tailnet subnet route health should be monitored from the tailnet consumer side, not just from the advertising side.
tailscale statusshowing a route as "advertised and approved" does not mean packets are actually being forwarded correctly
Action Items¶
- Restart
tailscaledto restore the missingthrowroute in table 52 - Add
systemd-networkdandtailscaledto the needrestart blacklist in/etc/needrestart/conf.d/incus.confto prevent auto-restart of either service during unattended upgrades (see Incus Server: needrestart Configuration) - File a Tailscale issue describing the race condition in the
RTM_DELROUTErecovery path that leaves table 52 routes unrestored when networkd is still tearing down