fix: Clean up interfaceLockMap entries on endpoint deletion #1249

byte-msft · 2025-01-21T10:52:23Z

Description

The packetParser was creating entries in interfaceLockMap for each new interface
but failing to remove them when interfaces were deleted. In environments with
high pod counts and frequent churn, this caused a memory leak as the map grew
indefinitely.

Related Issue

Potential memory leak in packetparser's interfaceLockMap #1236

Checklist

I have read the contributing documentation.
I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
I have correctly attributed the author(s) of the code.
I have tested the changes locally.
I have followed the project's style guidelines.
I have updated the documentation, if necessary.
I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes made.

Additional Notes

Solution

Added cleanup of interfaceLockMap entries in the EndpointDeleted case
Improved mutex handling logic to prevent resource leaks
Updated test cases to verify proper cleanup of both tcMap and interfaceLockMap

Testing

Added comprehensive test coverage for interface deletion scenario
Verified cleanup of both maps in test cases
Tested with high pod churn scenarios

Impact

This fix prevents memory leaks in environments with frequent pod creation/deletion,
improving the overall stability and resource usage of the system.

Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

…ions Signed-off-by: Yerlan Baiturinov <[email protected]>

Signed-off-by: Yerlan Baiturinov <[email protected]>

…r pacage Signed-off-by: Yerlan Baiturinov <[email protected]>

Signed-off-by: Yerlan Baiturinov <[email protected]>

SRodi

LGTM but I'll leave @nddq or @rbtr to review & approve

kamilprz · 2025-01-21T14:18:42Z

pkg/plugin/packetparser/packetparser_linux.go

 		p.l.Debug("Endpoint created", zap.String("name", iface.Name))
 		p.createQdiscAndAttach(iface, Veth)
 	case endpoint.EndpointDeleted:
+		// Get the mutex only if it exists
+		lockMapVal, exists := p.interfaceLockMap.Load(ifaceKey)


Bit of a nitpick / question since I'm still new to Go. Is it recommended to stick with the ok idiom? Or is it fine to use other variable names like exists?

Hey Kamil! Yeah, I initially used exists to make the code more readable, but you're right - we should stick with ok to follow Go conventions in the main code.

For the test case though, I deliberately kept tcMapExists and lockMapExists because test code is a bit different - clarity is super important there since we're verifying specific behaviors. The more explicit naming makes it immediately obvious what we're testing for, especially when someone's debugging failed tests.

Let me know if you'd like me to update the main code to use ok instead of exists!

Sure yeah, let's stick with ok for the main code then :)

nddq · 2025-01-21T15:14:55Z

we should take this chance to examine whether or not do we even need these sync.Mutex for each interface, since we are processing them sequentially and not concurrently anyway

MikeZappa87 · 2025-01-21T20:20:42Z

pkg/plugin/packetparser/packetparser_linux.go


 	switch event.Type {
 	case endpoint.EndpointCreated:
+		// Create mutex only when needed
+		lockMapVal, _ := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{})


I am curious of the need here. This seems a little bit complex and could a simpler approach be used?

Sure, thanks

I've removed the Mutex mechanism since we're now using a sequential approach for adding and removing interfaces

Signed-off-by: Yerlan Baiturinov <[email protected]>

byte-msft · 2025-01-21T21:29:45Z

we should take this chance to examine whether or not do we even need these sync.Mutex for each interface, since we are processing them sequentially and not concurrently anyway

Yeah, you make a valid point! Since we're processing interfaces sequentially, we can safely remove the mutex mechanism. Let's keep it simple and avoid unnecessary complexity. I've updated the PR to remove the mutex-related code.

anubhabMajumdar · 2025-01-21T22:50:43Z

pkg/plugin/packetparser/packetparser_linux.go

 	ifaceKey := ifaceToKey(iface)
-	lockMapVal, _ := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{})


Why are you deleting this? The interfaceLockMap allows us to store a per interface lock and we can create/delete multiple qdisc in parallel. This is necessary (in place of a single lock) because large number of pods can come up at the same time, and we should start capturing packets as quickly as possible.

anubhabMajumdar · 2025-01-21T22:52:32Z

pkg/plugin/packetparser/packetparser_linux.go

 		if value, ok := p.tcMap.Load(ifaceKey); ok {
 			v := value.(*tcValue)
 			p.clean(v.tc, v.qdisc)
-			// Delete from map.
 			p.tcMap.Delete(ifaceKey)
 		}


We need to delete the ifacekey from interfaceLockMap if it's deleted.

Reverted my latest changes, so I brought back the interfaceLockMap and removing the ifacekey from the map

Signed-off-by: Yerlan Baiturinov <[email protected]>

anubhabMajumdar · 2025-01-22T18:11:39Z

pkg/plugin/packetparser/packetparser_linux.go

-		p.l.Debug("Endpoint created", zap.String("name", iface.Name))
-		p.createQdiscAndAttach(iface, Veth)
+		// Get or create mutex atomically
+		lockMapVal, loaded := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{})


Why bring this inside the switch case? We are duplicating code in L399-L404.

Hey Anubhab,

Yeah, I get the concern about code duplication, but in this case I think it makes sense to keep the mutex logic separate in each case. The create and delete paths need different mutex handling - creation needs to make a new mutex if it doesn't exist, while deletion only works with existing ones. So, I added an extra check.

Trying to handle it outside the switch would probably make the code more generalized and less clear, in addition we are locking the map if the other cases will be added.

What do you think? Happy to revert the code to the previous state, if you think if it's unnecessary

Signed-off-by: Yerlan Baiturinov <[email protected]>

byte-msft added 5 commits January 17, 2025 13:15

fix: ensure jq is installed and add coverage generation in GitHub Act…

08ec2d8

…ions Signed-off-by: Yerlan Baiturinov <[email protected]>

fix: debug the GitHub actions

0fb0297

Signed-off-by: Yerlan Baiturinov <[email protected]>

fix: Turn on --output flag on Docker build for test actions

280a6cb

Signed-off-by: Yerlan Baiturinov <[email protected]>

fix: Clean up interfaceLockMap entries on endpoint deletion

50c2a15

Signed-off-by: Yerlan Baiturinov <[email protected]>

Merge remote-tracking branch 'origin/main' into fix/issues/1236

5912634

byte-msft self-assigned this Jan 21, 2025

byte-msft requested a review from a team as a code owner January 21, 2025 10:52

byte-msft requested review from jimassa and alam-tahmid January 21, 2025 10:52

byte-msft linked an issue Jan 21, 2025 that may be closed by this pull request

Potential memory leak in packetparser's interfaceLockMap #1236

Open

fix: Remove obsolete Mutex creation in EndpointWatcher in PacketParse…

6b11dd5

…r pacage Signed-off-by: Yerlan Baiturinov <[email protected]>

byte-msft requested a review from nddq January 21, 2025 11:03

fix: PacketParser Test linter processed

1a1187a

Signed-off-by: Yerlan Baiturinov <[email protected]>

SRodi reviewed Jan 21, 2025

View reviewed changes

kamilprz reviewed Jan 21, 2025

View reviewed changes

MikeZappa87 reviewed Jan 21, 2025

View reviewed changes

fix: Get rid of interfaceLockMap mutex from Packet Parser

3e78e6b

Signed-off-by: Yerlan Baiturinov <[email protected]>

Merge branch 'main' into fix/issues/1236

a4d7e14

anubhabMajumdar requested changes Jan 21, 2025

View reviewed changes

fix: Revert getting rid of interfaceLockMap

31aa74c

Signed-off-by: Yerlan Baiturinov <[email protected]>

anubhabMajumdar reviewed Jan 22, 2025

View reviewed changes

fix: Add default case into endpointWatcherCallbackFn

fdab1aa

Signed-off-by: Yerlan Baiturinov <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Clean up interfaceLockMap entries on endpoint deletion #1249

fix: Clean up interfaceLockMap entries on endpoint deletion #1249

byte-msft commented Jan 21, 2025

SRodi left a comment

kamilprz Jan 21, 2025

byte-msft Jan 21, 2025 •

edited

Loading

kamilprz Jan 23, 2025

nddq commented Jan 21, 2025

MikeZappa87 Jan 21, 2025

byte-msft Jan 21, 2025

byte-msft commented Jan 21, 2025

anubhabMajumdar Jan 21, 2025

anubhabMajumdar Jan 21, 2025

byte-msft Jan 22, 2025

anubhabMajumdar Jan 22, 2025

byte-msft Jan 22, 2025

		ifaceKey := ifaceToKey(iface)
		lockMapVal, _ := p.interfaceLockMap.LoadOrStore(ifaceKey, &sync.Mutex{})

fix: Clean up interfaceLockMap entries on endpoint deletion #1249

Are you sure you want to change the base?

fix: Clean up interfaceLockMap entries on endpoint deletion #1249

Conversation

byte-msft commented Jan 21, 2025

Description

Related Issue

Checklist

Screenshots (if applicable) or Testing Completed

Additional Notes

Solution

Testing

Impact

This fix prevents memory leaks in environments with frequent pod creation/deletion, improving the overall stability and resource usage of the system.

SRodi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

byte-msft Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nddq commented Jan 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

byte-msft commented Jan 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This fix prevents memory leaks in environments with frequent pod creation/deletion,
improving the overall stability and resource usage of the system.

byte-msft Jan 21, 2025 •

edited

Loading