Skip to content

Tailscale Subnet Route Lost After networkd Restart, 2026-04-10

Summary

The Incus server acts as a Tailscale subnet router advertising a handful of 192.168.1.0/24 host addresses into the tailnet. After a systemd-networkd restart wiped ip rules earlier in the day, Tailscale's recovery path only restored the ip rules and skipped a full route reconfiguration, leaving the throw 192.168.1.0/24 entry missing from routing table 52. Any tailnet peer trying to reach a 192.168.1.x host hit a routing loop on the Incus server instead of being forwarded to the physical LAN.

Situation

The Incus server runs Tailscale (1.96.4) as a subnet router, advertising:

  • 10.10.10.0/24 (Incus internal subnet)
  • 192.168.1.3/32, 192.168.1.4/32, 192.168.1.90/32 (physical LAN hosts)
  • fd42:10:10:10::/64

Tailscale manages its own policy routing via table 52. For each advertised subnet, it installs a throw route in table 52, which causes lookups for those destinations to fall back to the main routing table so the packet gets forwarded out the physical interface enp1s0 instead of being caught by the table 52 default dev tailscale0 entry.

The Incus server also uses the Sakura VPS as a Tailscale exit node for outbound traffic, which continued to work correctly throughout this incident.

Timeline (JST)

Time Event
15:47:13 apt-daily-upgrade.service runs, unattended-upgrades starts
15:47:25 unattended-upgrades upgrades the incus package and triggers incus.service to stop
15:52:25 incus.service finishes restarting, around 5 minutes to gracefully stop all running VMs and QEMU processes
15:52:46 Incus brings incusbr0 and other virtual interfaces back up. systemd-networkd restarts in response and flushes ip rules. tailscaled logs router: somebody (likely systemd-networkd) deleted ip rules; restoring Tailscale's. The ip rules are restored, but the throw 192.168.1.0/24 entry in table 52 is not re-added. throw 10.10.10.0/24 remains present
15:53:34 Monitoring flags the 192.168.1.x hosts as unreachable from the tailnet, about 48 seconds after the networkd restart
~21:30 Investigation starts. tailscale status on the subnet router shows the routes still advertised. ping from a tailnet peer (lavie, 100.119.140.15) to 192.168.1.90 fails
~21:35 tcpdump -i tailscale0 shows the ICMP echo request arriving from 100.119.140.15, immediately followed by a looped copy sourced from the Incus Tailscale IP 100.98.98.126 to the same destination. tcpdump -i enp1s0 shows nothing. Routing loop confirmed
~21:38 ip route show table 52 shows throw 10.10.10.0/24 and default dev tailscale0 but no throw 192.168.1.0/24. Root cause identified
~21:40 journalctl -u tailscaled reviewed, confirming the systemd-networkd flush at 15:52 and Tailscale's partial recovery
21:42:34 systemctl restart tailscaled forces a full Reconfig, which reinstalls all throw routes in table 52. Monitoring recovers

Total downtime: 5 hours 49 minutes (15:53:34 to 21:42:34).

Root Cause

The trigger was unattended-upgrades upgrading the incus package at 15:47:25, which restarted incus.service. When Incus came back up at 15:52:46 and recreated its virtual interfaces (incusbr0 and friends), systemd-networkd restarted in response and flushed routing policy rules it considered unmanaged.

The actual bug sits in Tailscale's recovery path. Tailscale's route monitor detects the flush and logs router: somebody (likely systemd-networkd) deleted ip rules; restoring Tailscale's, which shows it knows what just happened and is meant to put things back. In practice it only re-installs the ip rules at priorities 5210 to 5270 and does not re-sync the routes inside table 52. A full wgengine: Reconfig: configuring router would reinstall both, but the recovery path skips that step.

In this case the throw 192.168.1.0/24 entry was lost and never put back. With the ip rules in place but the throw route missing, any packet destined for a 192.168.1.x host hit the table 52 default dev tailscale0 route instead of falling through to the main table. The packet was then re-emitted on tailscale0 with the Incus Tailscale IP as source, looped back into Tailscale's forwarding path, and never reached enp1s0.

The throw 10.10.10.0/24 entry happened to survive, which is why the Incus internal subnet kept working and masked the severity of the problem until a tailnet peer tried to reach a physical LAN host. Why one throw route survived and the other did not is unclear from the logs and is part of what makes this worth reporting upstream.

Because the trigger is any incus package upgrade, this chain will repeat on every future Incus update unless something in the chain is broken.

Impact

  • Tailnet peers could not reach any 192.168.1.x host advertised as a subnet route through the Incus server
  • The Incus internal subnet (10.10.10.0/24) was unaffected because its throw route survived the partial recovery
  • Outbound traffic from the Incus server through the Sakura VPS exit node was unaffected
  • Public-facing self-hosted services were unaffected, since they do not depend on the 192.168.1.x subnet routes

What Went Well

  • Monitoring flagged the failure about 90 seconds after the networkd restart, so the incident window was bounded by response time rather than detection time
  • tcpdump on tailscale0 and enp1s0 immediately showed the routing loop: request in on tailscale0, looped copy right after, nothing on enp1s0
  • ip route show table 52 made the missing throw 192.168.1.0/24 entry obvious once the comparison with throw 10.10.10.0/24 was made
  • The tailscaled journal clearly logged the systemd-networkd flush event, pinpointing the trigger
  • systemctl restart tailscaled was enough to restore correct state, no manual route surgery required

What Did Not Go Well

  • Response time was almost 6 hours: monitoring caught the outage at 15:53, but the alert was not acted on until ~21:30
  • Tailscale's recovery path logs that it is "restoring Tailscale's" rules, which reads like a full recovery, but it only restores ip rules and not table 52 routes. The log line gives false confidence and the failure mode is invisible from the journal alone
  • Nothing on the Incus server reloads Tailscale's routing state when systemd-networkd restarts, so any future networkd restart is likely to reproduce this

Lessons Learned

  • systemd-networkd will flush routing policy rules on restart unless explicitly told not to. Tailscale recovers ip rules but does not always re-sync table 52 routes
  • Tailscale's subnet router installs throw routes in table 52 for each advertised subnet. If one of those throw routes is missing, packets for that subnet hit the fallback default dev tailscale0 entry and loop back out the Tailscale interface
  • A routing loop in a subnet router shows up in tcpdump as a duplicated packet: the original from the tailnet peer, immediately followed by a copy sourced from the subnet router's own Tailscale IP
  • A full systemctl restart tailscaled forces a complete router reconfiguration, which is sufficient to recover from this partial recovery bug

Action Items

  • Restart tailscaled to restore the missing throw route in table 52
  • Set ManageForeignRoutingPolicyRules=no under [Network] in /etc/systemd/networkd.conf so networkd stops flushing Tailscale's ip rules when it restarts. This breaks the chain at the networkd level, regardless of what triggered the restart
  • File a Tailscale issue describing the partial recovery path that restores ip rules without re-syncing table 52 routes

Considered Alternatives

  • PartOf=incus.service on tailscaled: would tie Tailscale's lifecycle to Incus, but would also stop Tailscale whenever Incus is stopped for planned maintenance, cutting off tailnet SSH access to the server while Incus is down. Rejected
  • ExecStartPost=systemctl restart tailscaled drop-in on incus.service: would force a clean Tailscale re-init after every Incus restart. Works, but only covers the Incus trigger. Anything else that causes a networkd restart would still reproduce the bug. Not selected
  • Exclude incus from unattended-upgrades: would trade automated security updates for manual control. Does not fix the underlying mechanism and only papers over this particular trigger. Rejected