-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is it platform specific
generic
Importance or Severity
High
Previous Working Version
202405.33
Steps to Reproduce
Problem Description:
The workaround no zebra nexthop kernel enable was re-applied in 202405 (#23294) because the supposed "proper fixes" from FRR (#18953) did not actually resolve the original LAG member flap issue (#17345). This workaround causes massive kernel percpu memory consumption (>2GB reported in #24607) in T2 chassis downstream linecards.
Issue Timeline - The Failed Fix Cycle
Original Issue #17345: LAG member flap caused BGP route installation failures
Band-aid workaround #17344: Added no zebra nexthop kernel enable → Worked reliably
Supposed "proper fix" #18953: Integrated FRR PR #15362/#15823, reverted workaround
Fix validation failure: Original LAG flap issue returned in testing/production
Forced reversion #23294: Had to re-apply band-aid to 202405 for reliability
Memory regression #24607: >2GB percpu memory explosion in production
Finding: FRR Fixes Are Insufficient
The FRR "fixes" (PR #15362 and #15823) do not actually resolve the original nexthop group synchronization problem. Field testing showed:
- LAG member add/remove operations still cause route installation failures
- Route install failed errors still occur during interface events
- Routes still show "failed": true in FRR after interface flaps
- Only the workaround provides reliable operation
Now, we are forced to choose from
Reliability: Keep workaround, accept >2GB memory cost
Memory efficiency: Remove workaround, accept route failures
Memory Explosion Analysis
Issue #24607 reports >2GB percpu memory increase when using no zebra nexthop kernel enable.
The workaround disables kernel nexthop group optimization, forcing the kernel to:
- Create individual route entries instead of shared nexthop group objects
- Allocate percpu data structures for each individual nexthop
- Maintain separate kernel objects that would otherwise be shared
Impact Scale
In T2 chassis environments with high route counts, multipath BGP, and multi-core systems, the memory impact scales as: routes × nexthops_per_route × cpu_cores × percpu_overhead
Production impact: >2GB percpu memory increase reported in #24607
Impact of this regression
Memory explosion as the route scale increases
Relevant log output
Output of show version, show techsupport
Attach files (if any)
No response