Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: V6 RA not sent anymore after interface up-down-up #18451

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

soumyar-roy
Copy link
Contributor

@soumyar-roy soumyar-roy commented Mar 21, 2025

zebra: V6 RA not sent anymore after interface up-down-up

Issue:
Once interface is shutdown, the interface is removed from
wheel timer. Now when the interface is up again, current code
won't add the interface to wheel timer again, so it won't send RA
anymore for that interface

Fix:
Moved wheel_add for interface inside rtadv_start_interface_events
This is more common function which gets triggered for both
RA enable and interface up event

Also on any kind of interface activation event, we try to send
RA as soon as possible. This is to satisfy requirement where
quick RA is needed, especially for some convergence, dependent on
RA.

Testing:
Did ineterface up to down to up
Added debug log for RA, checked it is getting advertised preodically
after when up at up state

show bgp summary for 512 bgp peers for bgp bgp unnumbered works fine.

Signed-off-by: Soumya Roy [email protected]

@soumyar-roy soumyar-roy marked this pull request as draft March 21, 2025 20:30
@mjstapp
Copy link
Contributor

mjstapp commented Mar 21, 2025

so ... please add a meaningful title and description?

@soumyar-roy soumyar-roy changed the title Soumya/fastra zebra: send v6 fast RA at faster interval Mar 21, 2025
@soumyar-roy soumyar-roy force-pushed the soumya/fastra branch 2 times, most recently from f4eedc5 to 7095a84 Compare March 21, 2025 21:51
@soumyar-roy
Copy link
Contributor Author

so ... please add a meaningful title and description?

Added now

lib/wheel.c Outdated
list_isempty(wheel->wheel_slot_lists[curr_slot])) {
/* Came to back to same slot and that is empty
* so the wheel is empty, puase it
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the comment indentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1)This comment is indented w.r.t if (!wheel->run_forever) {. before running git clang-format >>
((((curr_slot + slots_to_skip) % wheel->slots) == curr_slot) &&

  •           list_isempty(wheel->wheel_slot_lists[curr_slot])) {<<<This line is tab indented
    
  •           /* Came to back to same slot and that is empty
    
  •            * so the wheel is empty, stop it
    
  •           */
    
  •           if (!wheel->run_forever) {
    
  •                   wheel_stop(wheel);
    
  •                   if (debug_timer_wheel)
    
  •                           zlog_debug("Stopped an empty  wheel %p", wheel);
    
  •                   return;
    
  •           }
    
  •   }
    

2)After git clang-format >>
if ((((curr_slot + slots_to_skip) % wheel->slots) == curr_slot) &&

  •           list_isempty(wheel->wheel_slot_lists[curr_slot])) {
    
  •       list_isempty(wheel->wheel_slot_lists[curr_slot])) {<<<<<This line gets space indented
              /* Came to back to same slot and that is empty
               * so the wheel is empty, stop it
              */
    
  1. If I dont do step 2) I get style suggestion error curl https://gist.githubusercontent.com/polychaeta/13d7c1b3f9c07b87352be22b5f29ad01/raw/55bb8b7724008c107d333a05c8db9a785a2db0f7/style.diff | git apply -. So restoring back to 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no: line 56 is not correct. it looks like it's missing a space to align the comment block

@ton31337
Copy link
Member

Can we have a bit of the context when fast/regular wheels are used?

@soumyar-roy soumyar-roy force-pushed the soumya/fastra branch 2 times, most recently from c13b1c3 to 3a9feb0 Compare March 23, 2025 18:08
@soumyar-roy
Copy link
Contributor Author

Can we have a bit of the context when fast/regular wheels are used?

Added more context

@soumyar-roy soumyar-roy marked this pull request as ready for review March 24, 2025 00:53
@ton31337
Copy link
Member

Thanks, makes sense now, but please put it inside the commit (not in PR).

@soumyar-roy soumyar-roy force-pushed the soumya/fastra branch 2 times, most recently from 9c1eb0a to f48282b Compare March 27, 2025 18:43
@soumyar-roy
Copy link
Contributor Author

Fixed space before */ in comment

@soumyar-roy soumyar-roy force-pushed the soumya/fastra branch 2 times, most recently from b82ae8a to 8fa9049 Compare March 28, 2025 00:21
@frrbot frrbot bot added the tests Topotests, make check, etc label Mar 28, 2025
@soumyar-roy soumyar-roy force-pushed the soumya/fastra branch 4 times, most recently from a024fa7 to 1279b0c Compare March 31, 2025 02:56
@soumyar-roy soumyar-roy reopened this Mar 31, 2025
Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've been talking about this for some time, and I think I'm increasing uncertain that this is the right approach to the issue.
zebra is running as a normal user-space process; there's no real-time feature available. and in that context, I just don't see how it's realistic to expect to be able to travel through the event library framework, through the RA callback, and then back again, in 10 msec granularity in any consistent way. all it takes is one operation in zebra that runs for 100s of msecs to throw this kind of scheduling out the window - and there are plenty of those kinds of events.
If this kind of "one-time" operation is necessary, it might make more sense to avoid all this "two timers" kind of work and just do the resends in-line.
a) if zebra isn't busy, and there aren't many interfaces that need "fast" behavior, sending a couple of packets with a short delay won't be a problem
b) if zebra is busy, you could keep a list of the interfaces that need "fast" treatment, and do a batch at a time, with normal event reschedules in between batches. you'd be more likely to actually get something close to the behavior you want.
c) again, the 'not busy' situation could use the same mechanism; it'd just degenerate to one small batch.

@soumyar-roy
Copy link
Contributor Author

soumyar-roy commented Mar 31, 2025

we've been talking about this for some time, and I think I'm increasing uncertain that this is the right approach to the issue. zebra is running as a normal user-space process; there's no real-time feature available. and in that context, I just don't see how it's realistic to expect to be able to travel through the event library framework, through the RA callback, and then back again, in 10 msec granularity in any consistent way. all it takes is one operation in zebra that runs for 100s of msecs to throw this kind of scheduling out the window - and there are plenty of those kinds of events. If this kind of "one-time" operation is necessary, it might make more sense to avoid all this "two timers" kind of work and just do the resends in-line. a) if zebra isn't busy, and there aren't many interfaces that need "fast" behavior, sending a couple of packets with a short delay won't be a problem b) if zebra is busy, you could keep a list of the interfaces that need "fast" treatment, and do a batch at a time, with normal event reschedules in between batches. you'd be more likely to actually get something close to the behavior you want. c) again, the 'not busy' situation could use the same mechanism; it'd just degenerate to one small batch.

Thanks for pointing out. To give a context 1) I was consistently hitting socket error issue without batching, i.e without any changes to RA module, no wheel timers, just the base code, with 514 interfaces. 2) Initially I designed with list__ apis , like listnode_add, listnode_delete etc, then I was suggested not to use them, as recent guide line was not to use them, I was suggested to use wheel timers. so I had to revert back those changes 3) So the wheels came in picture, 4) I need to decide something now, as without few things addressed via this change, it is broken certain cases wont work fine.

@mjstapp
Copy link
Contributor

mjstapp commented Mar 31, 2025

Sure- but I think there are several separate issues

  1. right, we should use the correct/current collections, the right/current lists. that's sort of independent of the core issue?
  2. I think we don't want to use a timer-per-interface (if that was part of the problem) - that's obviously going to have scaling issues. I'm suggesting no change to the current timer code (one timer for the "normal" wheel), and one new event to schedule a "fast" batch.
  3. I think the "batch" size should probably not be 500 - but 100 might be ok. if you do 100 interfaces, 3 packets each with a 10ms delay, I'd bet you could get through that in a reasonable time (100 ms? less?), and then reschedule for the next batch. that's 300 packets in 100ms, and that seems ... pretty doable from user-space? you've got the main zebra thread, so I wouldn't want to hang on to it for too long, but doing some kind of batch size seems reasonable.

I don't remember what the "socket error" was - but ... again, I'd expect to be able to get some packets out, even if not all the packets can be sent?

Thanks for pointing out. To give a context 1) I was consistently hitting socket error issue without batching, i.e without any changes to RA module, no wheel timers, just the base code, with 514 interfaces. 2) Initially I designed with list__ apis , like listnode_add, listnode_delete etc, then I was suggested not to use them, as recent guide line was not to use them, I was suggested to use wheel timers. so I had to revert back those changes 3) So the wheels came in picture, 4) I need to decide something now, as without few things addressed via this change, it is broken certain cases wont work fine.

@soumyar-roy
Copy link
Contributor Author

soumyar-roy commented Mar 31, 2025

Sure- but I think there are several separate issues

  1. right, we should use the correct/current collections, the right/current lists. that's sort of independent of the core issue?
  2. I think we don't want to use a timer-per-interface (if that was part of the problem) - that's obviously going to have scaling issues. I'm suggesting no change to the current timer code (one timer for the "normal" wheel), and one new event to schedule a "fast" batch.
  3. I think the "batch" size should probably not be 500 - but 100 might be ok. if you do 100 interfaces, 3 packets each with a 10ms delay, I'd bet you could get through that in a reasonable time (100 ms? less?), and then reschedule for the next batch. that's 300 packets in 100ms, and that seems ... pretty doable from user-space? you've got the main zebra thread, so I wouldn't want to hang on to it for too long, but doing some kind of batch size seems reasonable.

I don't remember what the "socket error" was - but ... again, I'd expect to be able to get some packets out, even if not all the packets can be sent?

Thanks for pointing out. To give a context 1) I was consistently hitting socket error issue without batching, i.e without any changes to RA module, no wheel timers, just the base code, with 514 interfaces. 2) Initially I designed with list__ apis , like listnode_add, listnode_delete etc, then I was suggested not to use them, as recent guide line was not to use them, I was suggested to use wheel timers. so I had to revert back those changes 3) So the wheels came in picture, 4) I need to decide something now, as without few things addressed via this change, it is broken certain cases wont work fine.

Sure- but I think there are several separate issues

  1. right, we should use the correct/current collections, the right/current lists. that's sort of independent of the core issue?
  2. I think we don't want to use a timer-per-interface (if that was part of the problem) - that's obviously going to have scaling issues. I'm suggesting no change to the current timer code (one timer for the "normal" wheel), and one new event to schedule a "fast" batch.
  3. I think the "batch" size should probably not be 500 - but 100 might be ok. if you do 100 interfaces, 3 packets each with a 10ms delay, I'd bet you could get through that in a reasonable time (100 ms? less?), and then reschedule for the next batch. that's 300 packets in 100ms, and that seems ... pretty doable from user-space? you've got the main zebra thread, so I wouldn't want to hang on to it for too long, but doing some kind of batch size seems reasonable.

I don't remember what the "socket error" was - but ... again, I'd expect to be able to get some packets out, even if not all the packets can be sent?

Thanks for pointing out. To give a context 1) I was consistently hitting socket error issue without batching, i.e without any changes to RA module, no wheel timers, just the base code, with 514 interfaces. 2) Initially I designed with list__ apis , like listnode_add, listnode_delete etc, then I was suggested not to use them, as recent guide line was not to use them, I was suggested to use wheel timers. so I had to revert back those changes 3) So the wheels came in picture, 4) I need to decide something now, as without few things addressed via this change, it is broken certain cases wont work fine.

So are you telling , it is fine to have one wheel timer for regular RA which are scheduled 1 sec interval, but for faster RA which is at 10 ms , use regular event based timer ?, like maintain a list of interfaces for faster RA, and batch it over the list, ie say 100 items per list walk per event timer expiry , do next 100 in next iteration?. If so, what's list api to use, is DECLARE_LIST good for this?. Just to note, we never had per interface any kind of timer, whether wheel or regular event timer, for all designs I explored so far, all interfaces were under some kind of common timer. Thanks.

@mjstapp
Copy link
Contributor

mjstapp commented Mar 31, 2025

if you're asking about lists, the lists in lib/typesafe.h are the current/supported list types; you can see examples in many places.

are you actually doing some test that tracks the RA message output? I'd be curious to know what that is showing. or are you just looking at a test - like a topotest - that shows some side-effect, something like link-local peering?

So are you telling , it is fine to have one wheel timer for regular RA which are scheduled 1 sec interval, but for faster RA which is at 10 ms , use regular event based timer ?, like maintain a list of interfaces for faster RA, and batch it over the list, ie say 100 items per list walk per event timer expiry , do next 100 in next iteration?. If so, what's list api to use, is STAILQ good for this?. Just to note, we never had per interface any kind of timer, whether wheel or regular event timer, for all designs I explored so far, all interfaces were under some kind of common timer. Thanks.

@soumyar-roy
Copy link
Contributor Author

soumyar-roy commented Mar 31, 2025

if you're asking about lists, the lists in lib/typesafe.h are the current/supported list types; you can see examples in many places.

are you actually doing some test that tracks the RA message output? I'd be curious to know what that is showing. or are you just looking at a test - like a topotest - that shows some side-effect, something like link-local peering?

So are you telling , it is fine to have one wheel timer for regular RA which are scheduled 1 sec interval, but for faster RA which is at 10 ms , use regular event based timer ?, like maintain a list of interfaces for faster RA, and batch it over the list, ie say 100 items per list walk per event timer expiry , do next 100 in next iteration?. If so, what's list api to use, is STAILQ good for this?. Just to note, we never had per interface any kind of timer, whether wheel or regular event timer, for all designs I explored so far, all interfaces were under some kind of common timer. Thanks.

I was using test_high_ecmp_unnumbered.py under tests/topotests/high_ecmp. I have some logs before any changes . Interestingly, I am not able to reproduce anymore, after backing out many changes.>>r1-eth354(356): Tx RA failed, socket 11 error 105 (No buffer space available)
2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth356(358): Tx RA failed, socket 11 error 105 (No buffer space available)
2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth359(361): Tx RA failed, socket 11 error 105 (No buffer space available)
2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth361(363): Tx RA failed, socket 11 error 105 (No buffer space available)
2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth363(365): Tx RA failed, socket 11 error 105 (No buffer space available)
2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth366(368): Tx RA failed, socket 11 error 105 (No buffer space available)
2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth368(370): Tx RA failed, socket 11 error 105 (No buffer space available

@mjstapp
Copy link
Contributor

mjstapp commented Apr 1, 2025

did you try increasing the send buffer size? there are plenty of examples of that.
maybe it would be good to have the code handle a send failure, instead of just ... not sending anything?

I was using test_high_ecmp_unnumbered.py under tests/topotests/high_ecmp. I have some logs before any changes . Interestingly, I am not able to reproduce anymore, after backing out many changes.>>r1-eth354(356): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth356(358): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth359(361): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth361(363): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth363(365): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth366(368): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth368(370): Tx RA failed, socket 11 error 105 (No buffer space available

@soumyar-roy
Copy link
Contributor Author

soumyar-roy commented Apr 1, 2025

Yes We tried after increasing buffer, and other stuffs in config files, but once the system is in error state, nothing worked. We went for batching after all effort failed.

did you try increasing the send buffer size? there are plenty of examples of that. maybe it would be good to have the code handle a send failure, instead of just ... not sending anything?

I was using test_high_ecmp_unnumbered.py under tests/topotests/high_ecmp. I have some logs before any changes . Interestingly, I am not able to reproduce anymore, after backing out many changes.>>r1-eth354(356): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth356(358): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth359(361): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth361(363): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth363(365): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth366(368): Tx RA failed, socket 11 error 105 (No buffer space available) 2025-02-12 19:33:28.687 [ERR!] zebra: [QGWPP-XPTHX][EC 100663299] r1-eth368(370): Tx RA failed, socket 11 error 105 (No buffer space available

@mwinter-osr
Copy link
Member

CI:rerun Rerunning the CI after fix on "[CI] Verify Source" incorrectly reporting bad status

@mjstapp
Copy link
Contributor

mjstapp commented Apr 4, 2025

No merge commits, please. Rebase your dev branch to newer upstream master as needed.

@soumyar-roy soumyar-roy force-pushed the soumya/fastra branch 2 times, most recently from 61cffc2 to 74796af Compare April 8, 2025 23:40
@github-actions github-actions bot added size/S and removed size/L labels Apr 8, 2025
Issue:
Once interface is shutdown, the interface is removed from
wheel timer. Now when the interface is up again, current code
won't add the interface to wheel timer again, so it won't send RA
anymore for that interface

Fix:
Moved wheel_add for interface inside rtadv_start_interface_events
This is more common function which gets triggered for both
RA enable and interface up event

Also on any kind of interface activation event, we try to send
RA as soon as possible. This is to satisfy requirement where
quick RA is needed, especially for some convergence, dependent on
RA.

Testing:
Did ineterface up to down to up
Added debug log for RA, checked it is getting advertised preodically
after when up at up state

show bgp summary for 512 bgp peers for bgp bgp unnumbered works fine.

Signed-off-by: Soumya Roy <[email protected]>
Currently wheel_add_item alows to add same element
multiple times, added a check to prevent that.

Signed-off-by: Soumya Roy <[email protected]>
@soumyar-roy soumyar-roy changed the title zebra: send v6 fast RA at faster interval zebra: V6 RA not sent anymore after interface up-down-up Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libfrr master rebase PR needs rebase size/S tests Topotests, make check, etc zebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants