-
Notifications
You must be signed in to change notification settings - Fork 119
Description
Setting the Default router option to Automatic in IPv6 RA settings causes high CPU usage and lock contention issues when bird2 is running on the router, populating the routing table the full global IPv6 routing table.
What essentially happens:
- The Bird2 daemon starts, and starts importing around 200k routes into the Linux routing table from BGP.
- Each imported route triggers a netlink event, caught by odhcpd, which invokes router_netevent_cb
- The callback invokes
uloop_timeout_set(&iface->timer_rs, 1000);, which leads to trigger_router_advert getting invoked. - If there is no default route set (which there isn't, as bird will setup individual routes for all global CIDRs, instead of a single default route), and the default_router boolean flag is not true (because we set the Default router option to Automatic), parse_routes is invoked to try and find a default route within the routing table.
- parse_routes attempts parsing the entire routing table, leading to high CPU usage and lock contention, leading to bird imports slowing down considerably (odhcpd reads the routing table, and bird cannot acquire a write lock, leading to sluggish imports).
Simply setting the Default router option to Forced solves the bird import lagging issue, even if send_router_advert is invoked anyway, as the CPU usage of trigger_router_advert is minimal compared to that of parse_routes, plus parse_routes also seems to acquire a lock on the routing table (the /proc/net/ipv6_route file resource is always open in any case, but the lagging occurs only when actively rewinding+reading, so I assume that kernel-level routing table locking only occurs when the file handle is not at EOF, or something similar).
Anyway, the flow described above is completely normal, the logic is entirely correct, but when there is a large number of routes being added, it can considerably slow down importing of routes, and while they are being imported no RAs can be emitted, either.
A solution I'd like to propose (and possibly implement, after some feedback) is a rate limiting mechanism to avoid calling parse_routes too often if the size of the routing table is too high (memoizing the previous result).
From what I understand, uloop_timeout_set(&iface->timer_rs, 1000); already limits the number of times the callback can be invoked (and from the code, it continually postpones callbacks of the same type if not invoked yet?), so it's a bit strange that trigger_router_advert is invoked continuously instead of 1 second after all routes are imported, but I've confirmed that it does get invoked continuously via strace...