-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Handle racing iface updates for the same ifname #11631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Handle racing iface updates for the same ifname #11631
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request attempts to fix a race condition where interface updates for the same interface name but different indices arrive out of order, leading to missing routes for statefulset pods. The solution changes the ifaceNameToIdx map from map[string]int to map[string]map[int]struct{} to track multiple indices per interface name simultaneously, deferring notifications until a single index remains.
Key Changes
- Modified data structure to track sets of interface indices per name rather than a single index
- Added logic to defer state change notifications when multiple indices exist for the same interface name
- Updated cleanup logic in both the main notification path and resync function
Comments suppressed due to low confidence (1)
felix/ifacemonitor/iface_monitor.go:382
- When transitioning from multiple interfaces with the same name to a single interface, the notification is sent for the wrong interface index. Consider this sequence:
- New "eth0" index 20 arrives while old index 10 still tracked → len=2, notification deferred
- Delete old "eth0" index 10 → len=1
- Line 362 check passes, notification proceeds with ifIndex=10 and newState=StateNotPresent
The notification should be for the remaining interface (index 20) and its actual current state, not for the deleted interface (index 10). After resolving the race condition by removing the old index, we should notify about the state of the interface that still exists, not the one that was just deleted. This requires looking up the remaining index and notifying about its state instead.
// In some cases, we can receive a notification for a new link of the same name before
// receiving the deletion notification for the old link. In that case, we want to avoid
// notifying of changes until the final state is known.
if len(m.ifaceNameToIdx[ifaceName]) > 1 {
log.WithFields(log.Fields{
"ifaceName": ifaceName,
"ifIndex": ifIndex,
"numIfaces": len(m.ifaceNameToIdx[ifaceName]),
}).Debug("Multiple interfaces with same name exist; deferring notification.")
return
}
logCxt := log.WithFields(log.Fields{
"ifaceName": ifaceName,
"ifIndex": ifIndex,
"oldState": oldState,
"newState": newState,
})
if oldState != newState {
logCxt.Debug("Interface changed state")
m.StateCallback(ifaceName, newState, ifIndex)
} else {
logCxt.Debug("Interface state hasn't changed, nothing to notify.")
}
fasaxc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding it a bit hard to reason about! I think to make downstream happy, I'd want to track what we've sent downstream and ensure that, when a conflict is resolved we send a delete for the old iface index and name and then send an update for the new.
| resyncC <-chan time.Time | ||
|
|
||
| ifaceNameToIdx map[string]int | ||
| ifaceNameToIdx map[string]map[int]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ifaceNameToIdx map[string]map[int]struct{} | |
| ifaceNameToIdx map[string]set.Adaptive[int] |
this'd be a good fit for an adaptive set since the mainline is exactly one entry
|
|
||
| // Defer notification if either: | ||
| // - there are now multiple interfaces with the same name. | ||
| // - there were multiple interfaces with the same name before this update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I grok this second condition. If we have gone from 2 to 1 then isn't that when you want to notify.
Description
This is an alternative to #11449 in an attempt to fix #11016
This approach tracks both interface indices for the name, and defers sending an update until we have resolved the interface name to a singular index. Both this PR and #11449 aim to cut out the intermediate "iface down" update that occurs as part of this race condition, wrongfully deleting routes that are then never added back. The difference in approach for this PR is that it ensures the
ifaceNametoIdxmap accurately repsresents our knowlege of iface indicies.Related issues/PRs
Fixes #11016
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*label.docs-pr-required: This change requires a change to the documentation that has not been completed yet.docs-completed: This change has all necessary documentation completed.docs-not-required: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*label.release-note-required: This PR has user-facing changes. Most PRs should have this label.release-note-not-required: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.