-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zebra: V6 RA not sent anymore after interface up-down-up #18451
base: master
Are you sure you want to change the base?
Conversation
so ... please add a meaningful title and description? |
2ec3ae2
to
fb482ba
Compare
f4eedc5
to
7095a84
Compare
Added now |
7095a84
to
fe352cb
Compare
fe352cb
to
0e7d05c
Compare
0e7d05c
to
568f5fe
Compare
lib/wheel.c
Outdated
list_isempty(wheel->wheel_slot_lists[curr_slot])) { | ||
/* Came to back to same slot and that is empty | ||
* so the wheel is empty, puase it | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix the comment indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1)This comment is indented w.r.t if (!wheel->run_forever) {. before running git clang-format >>
((((curr_slot + slots_to_skip) % wheel->slots) == curr_slot) &&
-
list_isempty(wheel->wheel_slot_lists[curr_slot])) {<<<This line is tab indented
-
/* Came to back to same slot and that is empty
-
* so the wheel is empty, stop it
-
*/
-
if (!wheel->run_forever) {
-
wheel_stop(wheel);
-
if (debug_timer_wheel)
-
zlog_debug("Stopped an empty wheel %p", wheel);
-
return;
-
}
-
}
2)After git clang-format >>
if ((((curr_slot + slots_to_skip) % wheel->slots) == curr_slot) &&
-
list_isempty(wheel->wheel_slot_lists[curr_slot])) {
-
list_isempty(wheel->wheel_slot_lists[curr_slot])) {<<<<<This line gets space indented /* Came to back to same slot and that is empty * so the wheel is empty, stop it */
- If I dont do step 2) I get style suggestion error curl https://gist.githubusercontent.com/polychaeta/13d7c1b3f9c07b87352be22b5f29ad01/raw/55bb8b7724008c107d333a05c8db9a785a2db0f7/style.diff | git apply -. So restoring back to 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no: line 56 is not correct. it looks like it's missing a space to align the comment block
Can we have a bit of the context when fast/regular wheels are used? |
c13b1c3
to
3a9feb0
Compare
Added more context |
Thanks, makes sense now, but please put it inside the commit (not in PR). |
3a9feb0
to
11dd5a0
Compare
9c1eb0a
to
f48282b
Compare
Fixed space before */ in comment |
b82ae8a
to
8fa9049
Compare
a024fa7
to
1279b0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've been talking about this for some time, and I think I'm increasing uncertain that this is the right approach to the issue.
zebra is running as a normal user-space process; there's no real-time feature available. and in that context, I just don't see how it's realistic to expect to be able to travel through the event library framework, through the RA callback, and then back again, in 10 msec granularity in any consistent way. all it takes is one operation in zebra that runs for 100s of msecs to throw this kind of scheduling out the window - and there are plenty of those kinds of events.
If this kind of "one-time" operation is necessary, it might make more sense to avoid all this "two timers" kind of work and just do the resends in-line.
a) if zebra isn't busy, and there aren't many interfaces that need "fast" behavior, sending a couple of packets with a short delay won't be a problem
b) if zebra is busy, you could keep a list of the interfaces that need "fast" treatment, and do a batch at a time, with normal event reschedules in between batches. you'd be more likely to actually get something close to the behavior you want.
c) again, the 'not busy' situation could use the same mechanism; it'd just degenerate to one small batch.
Thanks for pointing out. To give a context 1) I was consistently hitting socket error issue without batching, i.e without any changes to RA module, no wheel timers, just the base code, with 514 interfaces. 2) Initially I designed with list__ apis , like listnode_add, listnode_delete etc, then I was suggested not to use them, as recent guide line was not to use them, I was suggested to use wheel timers. so I had to revert back those changes 3) So the wheels came in picture, 4) I need to decide something now, as without few things addressed via this change, it is broken certain cases wont work fine. |
Sure- but I think there are several separate issues
I don't remember what the "socket error" was - but ... again, I'd expect to be able to get some packets out, even if not all the packets can be sent?
|
So are you telling , it is fine to have one wheel timer for regular RA which are scheduled 1 sec interval, but for faster RA which is at 10 ms , use regular event based timer ?, like maintain a list of interfaces for faster RA, and batch it over the list, ie say 100 items per list walk per event timer expiry , do next 100 in next iteration?. If so, what's list api to use, is DECLARE_LIST good for this?. Just to note, we never had per interface any kind of timer, whether wheel or regular event timer, for all designs I explored so far, all interfaces were under some kind of common timer. Thanks. |
if you're asking about lists, the lists in lib/typesafe.h are the current/supported list types; you can see examples in many places. are you actually doing some test that tracks the RA message output? I'd be curious to know what that is showing. or are you just looking at a test - like a topotest - that shows some side-effect, something like link-local peering?
|
I was using test_high_ecmp_unnumbered.py under tests/topotests/high_ecmp. I have some logs before any changes . Interestingly, I am not able to reproduce anymore, after backing out many changes.>>r1-eth354(356): Tx RA failed, socket 11 error 105 (No buffer space available) |
did you try increasing the send buffer size? there are plenty of examples of that.
|
Yes We tried after increasing buffer, and other stuffs in config files, but once the system is in error state, nothing worked. We went for batching after all effort failed.
|
CI:rerun Rerunning the CI after fix on "[CI] Verify Source" incorrectly reporting bad status |
No merge commits, please. Rebase your dev branch to newer upstream master as needed. |
61cffc2
to
74796af
Compare
Issue: Once interface is shutdown, the interface is removed from wheel timer. Now when the interface is up again, current code won't add the interface to wheel timer again, so it won't send RA anymore for that interface Fix: Moved wheel_add for interface inside rtadv_start_interface_events This is more common function which gets triggered for both RA enable and interface up event Also on any kind of interface activation event, we try to send RA as soon as possible. This is to satisfy requirement where quick RA is needed, especially for some convergence, dependent on RA. Testing: Did ineterface up to down to up Added debug log for RA, checked it is getting advertised preodically after when up at up state show bgp summary for 512 bgp peers for bgp bgp unnumbered works fine. Signed-off-by: Soumya Roy <[email protected]>
Currently wheel_add_item alows to add same element multiple times, added a check to prevent that. Signed-off-by: Soumya Roy <[email protected]>
74796af
to
cba179f
Compare
zebra: V6 RA not sent anymore after interface up-down-up
Issue:
Once interface is shutdown, the interface is removed from
wheel timer. Now when the interface is up again, current code
won't add the interface to wheel timer again, so it won't send RA
anymore for that interface
Fix:
Moved wheel_add for interface inside rtadv_start_interface_events
This is more common function which gets triggered for both
RA enable and interface up event
Also on any kind of interface activation event, we try to send
RA as soon as possible. This is to satisfy requirement where
quick RA is needed, especially for some convergence, dependent on
RA.
Testing:
Did ineterface up to down to up
Added debug log for RA, checked it is getting advertised preodically
after when up at up state
show bgp summary for 512 bgp peers for bgp bgp unnumbered works fine.
Signed-off-by: Soumya Roy [email protected]