Memory regression: "no zebra nexthop kernel enable" workaround causes massive memory increase on scaled routes system, increase is seen in kernel percpu memory

### Is it platform specific

generic

### Importance or Severity

High

### Previous Working Version

202405.33

### Steps to Reproduce

**Problem Description:**
The workaround no zebra nexthop kernel enable was re-applied in 202405 (#23294) because the supposed "proper fixes" from FRR (#18953) did not actually resolve the original LAG member flap issue (#17345). This workaround causes massive kernel percpu memory consumption (>2GB reported in #24607) in T2 chassis downstream linecards.

**Issue Timeline - The Failed Fix Cycle**
Original Issue #17345: LAG member flap caused BGP route installation failures
Band-aid workaround #17344: Added no zebra nexthop kernel enable → Worked reliably
Supposed "proper fix" #18953: Integrated FRR PR #15362/#15823, reverted workaround
Fix validation failure: Original LAG flap issue returned in testing/production
Forced reversion #23294: Had to re-apply band-aid to 202405 for reliability
Memory regression #24607: >2GB percpu memory explosion in production

**Finding:** FRR Fixes Are Insufficient
The FRR "fixes" (PR #15362 and #15823) do not actually resolve the original nexthop group synchronization problem. Field testing showed:

* LAG member add/remove operations still cause route installation failures
* Route install failed errors still occur during interface events
* Routes still show "failed": true in FRR after interface flaps
* Only the workaround provides reliable operation

Now, we are forced to choose from
 Reliability: Keep workaround, accept >2GB memory cost
 Memory efficiency: Remove workaround, accept route failures

**Memory Explosion Analysis**
Issue #24607 reports >2GB percpu memory increase when using no zebra nexthop kernel enable.

The workaround disables kernel nexthop group optimization, forcing the kernel to:
* Create individual route entries instead of shared nexthop group objects
* Allocate percpu data structures for each individual nexthop
* Maintain separate kernel objects that would otherwise be shared

**Impact Scale**
In T2 chassis environments with high route counts, multipath BGP, and multi-core systems, the memory impact scales as: routes × nexthops_per_route × cpu_cores × percpu_overhead

**Production impact**: >2GB percpu memory increase reported in #24607


### Impact of this regression

Memory explosion as the route scale increases

### Relevant log output

```shell

```

### Output of `show version`, `show techsupport`

```shell

```

### Attach files (if any)

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory regression: "no zebra nexthop kernel enable" workaround causes massive memory increase on scaled routes system, increase is seen in kernel percpu memory #24742

Is it platform specific

Importance or Severity

Previous Working Version

Steps to Reproduce

Impact of this regression

Relevant log output

Output of `show version`, `show techsupport`

Attach files (if any)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory regression: "no zebra nexthop kernel enable" workaround causes massive memory increase on scaled routes system, increase is seen in kernel percpu memory #24742

Description

Is it platform specific

Importance or Severity

Previous Working Version

Steps to Reproduce

Impact of this regression

Relevant log output

Output of show version, show techsupport

Attach files (if any)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Output of `show version`, `show techsupport`