Skip to content

Memory regression: "no zebra nexthop kernel enable" workaround causes massive memory increase on scaled routes system, increase is seen in kernel percpu memory #24742

@deepak-singhal0408

Description

@deepak-singhal0408

Is it platform specific

generic

Importance or Severity

High

Previous Working Version

202405.33

Steps to Reproduce

Problem Description:
The workaround no zebra nexthop kernel enable was re-applied in 202405 (#23294) because the supposed "proper fixes" from FRR (#18953) did not actually resolve the original LAG member flap issue (#17345). This workaround causes massive kernel percpu memory consumption (>2GB reported in #24607) in T2 chassis downstream linecards.

Issue Timeline - The Failed Fix Cycle
Original Issue #17345: LAG member flap caused BGP route installation failures
Band-aid workaround #17344: Added no zebra nexthop kernel enable → Worked reliably
Supposed "proper fix" #18953: Integrated FRR PR #15362/#15823, reverted workaround
Fix validation failure: Original LAG flap issue returned in testing/production
Forced reversion #23294: Had to re-apply band-aid to 202405 for reliability
Memory regression #24607: >2GB percpu memory explosion in production

Finding: FRR Fixes Are Insufficient
The FRR "fixes" (PR #15362 and #15823) do not actually resolve the original nexthop group synchronization problem. Field testing showed:

  • LAG member add/remove operations still cause route installation failures
  • Route install failed errors still occur during interface events
  • Routes still show "failed": true in FRR after interface flaps
  • Only the workaround provides reliable operation

Now, we are forced to choose from
Reliability: Keep workaround, accept >2GB memory cost
Memory efficiency: Remove workaround, accept route failures

Memory Explosion Analysis
Issue #24607 reports >2GB percpu memory increase when using no zebra nexthop kernel enable.

The workaround disables kernel nexthop group optimization, forcing the kernel to:

  • Create individual route entries instead of shared nexthop group objects
  • Allocate percpu data structures for each individual nexthop
  • Maintain separate kernel objects that would otherwise be shared

Impact Scale
In T2 chassis environments with high route counts, multipath BGP, and multi-core systems, the memory impact scales as: routes × nexthops_per_route × cpu_cores × percpu_overhead

Production impact: >2GB percpu memory increase reported in #24607

Impact of this regression

Memory explosion as the route scale increases

Relevant log output

Output of show version, show techsupport

Attach files (if any)

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions