Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move over to controller-runtime over doing it ourselves #348

Merged
merged 6 commits into from
Mar 31, 2025

Conversation

davidcollom
Copy link
Collaborator

@davidcollom davidcollom commented Mar 27, 2025

During testing we identified that there was significant memory utilisation when running version-checker for 2-3 days we saw that our test cluster was running at ~400+MB when it started off at around 40-60MB.

This PR changes over from manually implemented shared informers and managing the workqueue and its goroutines, to use controller-runtime, this means we don't have to manage as much code/logic and workers ourselves and is managed for us, allowing us to focus more on the purpose of version-checker.

We do believe that there is still a memory leak, however from testing (over 24 hour period) on the same cluster, that memory usage has increased at a much slower rate, additionally we've seen that goroutines are staying at a reasonable number, where as previously it was growing exponentially at, what was an alarming rate under some situations.

@davidcollom
Copy link
Collaborator Author

Some Screenshots of our findings.... Here's our last 2 days where version-checker was running for several days:
Screenshot 2025-03-27 at 3 40 37 pm

Here's a closer look at the memory utilisation:
Screenshot 2025-03-27 at 3 40 20 pm

And the GoRoutines:
Screenshot 2025-03-27 at 3 42 06 pm

After:
Screenshot 2025-03-27 at 3 42 42 pm

With a lot of this in mind, we've ensured that CPU usage/profile has not changed (%age is calculated towards 1 full CPU - I.E: 200m=0.2%) :

Screenshot 2025-03-27 at 3 43 46 pm

@@ -23,7 +23,6 @@ const (
)

type Client struct {
*http.Client
Copy link
Collaborator Author

@davidcollom davidcollom Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's never a http.Client used within ACR - it creates its own - so removing.

@davidcollom
Copy link
Collaborator Author

I discovered a secondary leak where some metrics weren't cleaned up on pod deletion.
Screenshot 2025-03-27 at 6 19 26 pm

@davidcollom davidcollom merged commit 3dcfc81 into main Mar 31, 2025
8 checks passed
@davidcollom davidcollom deleted the controller-runtime branch March 31, 2025 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants