-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Clean up interfaceLockMap entries on endpoint deletion #1249
base: main
Are you sure you want to change the base?
Conversation
…ions Signed-off-by: Yerlan Baiturinov <[email protected]>
Signed-off-by: Yerlan Baiturinov <[email protected]>
Signed-off-by: Yerlan Baiturinov <[email protected]>
Signed-off-by: Yerlan Baiturinov <[email protected]>
…r pacage Signed-off-by: Yerlan Baiturinov <[email protected]>
Signed-off-by: Yerlan Baiturinov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.l.Debug("Endpoint created", zap.String("name", iface.Name)) | ||
p.createQdiscAndAttach(iface, Veth) | ||
case endpoint.EndpointDeleted: | ||
// Get the mutex only if it exists | ||
lockMapVal, exists := p.interfaceLockMap.Load(ifaceKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit of a nitpick / question since I'm still new to Go. Is it recommended to stick with the ok
idiom? Or is it fine to use other variable names like exists
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Kamil! Yeah, I initially used exists
to make the code more readable, but you're right - we should stick with ok
to follow Go conventions in the main code.
For the test case though, I deliberately kept tcMapExists
and lockMapExists
because test code is a bit different - clarity is super important there since we're verifying specific behaviors. The more explicit naming makes it immediately obvious what we're testing for, especially when someone's debugging failed tests.
Let me know if you'd like me to update the main code to use ok
instead of exists
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure yeah, let's stick with ok
for the main code then :)
we should take this chance to examine whether or not do we even need these |
|
||
switch event.Type { | ||
case endpoint.EndpointCreated: | ||
// Create mutex only when needed | ||
lockMapVal, _ := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious of the need here. This seems a little bit complex and could a simpler approach be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks
I've removed the Mutex mechanism since we're now using a sequential approach for adding and removing interfaces
Signed-off-by: Yerlan Baiturinov <[email protected]>
Yeah, you make a valid point! Since we're processing interfaces sequentially, we can safely remove the mutex mechanism. Let's keep it simple and avoid unnecessary complexity. I've updated the PR to remove the mutex-related code. |
ifaceKey := ifaceToKey(iface) | ||
lockMapVal, _ := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you deleting this? The interfaceLockMap
allows us to store a per interface lock and we can create/delete multiple qdisc in parallel. This is necessary (in place of a single lock) because large number of pods can come up at the same time, and we should start capturing packets as quickly as possible.
if value, ok := p.tcMap.Load(ifaceKey); ok { | ||
v := value.(*tcValue) | ||
p.clean(v.tc, v.qdisc) | ||
// Delete from map. | ||
p.tcMap.Delete(ifaceKey) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to delete the ifacekey
from interfaceLockMap
if it's deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted my latest changes, so I brought back the interfaceLockMap and removing the ifacekey from the map
Signed-off-by: Yerlan Baiturinov <[email protected]>
p.l.Debug("Endpoint created", zap.String("name", iface.Name)) | ||
p.createQdiscAndAttach(iface, Veth) | ||
// Get or create mutex atomically | ||
lockMapVal, loaded := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bring this inside the switch case? We are duplicating code in L399-L404.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Anubhab,
Yeah, I get the concern about code duplication, but in this case I think it makes sense to keep the mutex logic separate in each case. The create and delete paths need different mutex handling - creation needs to make a new mutex if it doesn't exist, while deletion only works with existing ones. So, I added an extra check.
Trying to handle it outside the switch would probably make the code more generalized and less clear, in addition we are locking the map if the other cases will be added.
What do you think? Happy to revert the code to the previous state, if you think if it's unnecessary
Signed-off-by: Yerlan Baiturinov <[email protected]>
Description
The packetParser was creating entries in interfaceLockMap for each new interface
but failing to remove them when interfaces were deleted. In environments with
high pod counts and frequent churn, this caused a memory leak as the map grew
indefinitely.
Related Issue
Potential memory leak in packetparser's interfaceLockMap #1236
Checklist
git commit -S -s ...
). See this documentation on signing commits.Screenshots (if applicable) or Testing Completed
Please add any relevant screenshots or GIFs to showcase the changes made.
Additional Notes
Solution
Testing
Impact
This fix prevents memory leaks in environments with frequent pod creation/deletion,
improving the overall stability and resource usage of the system.
Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.