Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coroot-node-agent 1.23.7 memory utilization spikes leading to OOM #175

Open
gwyn-bl opened this issue Feb 4, 2025 · 6 comments
Open

Coroot-node-agent 1.23.7 memory utilization spikes leading to OOM #175

gwyn-bl opened this issue Feb 4, 2025 · 6 comments

Comments

@gwyn-bl
Copy link

gwyn-bl commented Feb 4, 2025

Hi!
After upgrading to coroot-node-agent version 1.23.7 it is encountering high memory usage spikes that leading to OOM

resources:
    requests:
      cpu: "2"
      memory: "4Gi"
    limits:
      cpu: "2"
      memory: "4Gi"

Image
Image

@apetruhin
Copy link
Member

Thank you for the report.

Could you please pick an affected pod (or a few) and dump the memory profile(s):

kubectl -n coroot exec -t coroot-node-agent-bck98 -- curl -o - 127.0.0.1:10300/debug/pprof/heap > mem_profile.tgz

Depending on the setup, the port could be 80 instead of 10300.

@gwyn-bl
Copy link
Author

gwyn-bl commented Feb 4, 2025

Here is dump of 2 pods

Can't upload archives here for some reason
Failed to upload "mem_profile.tgz"

Is link to google drive ok?
https://drive.google.com/drive/folders/1bKLulMbPteLlD7RE1fbtbngctWFbFFZL?usp=drive_link

@apetruhin
Copy link
Member

Profiles are broken for some reason:

go tool pprof Downloads/mem_profile.tgz       
Downloads/mem_profile.tgz: decompressing profile: flate: corrupt input before offset 358
failed to fetch any source profiles

Could you please show me the exact command you used to dump them?

@gwyn-bl
Copy link
Author

gwyn-bl commented Feb 4, 2025

Would be able to check and retry tomorrow. Also I find out that version 1.22.10 is latest without this issue, problem starts with 1.23.1

@gwyn-bl
Copy link
Author

gwyn-bl commented Feb 6, 2025

Profiling: https://drive.google.com/file/d/1ruxDdR8OYb6MxbNQxrRV2_ck8g7oPmoJ/view?usp=drive_link

command:

# curl -o - 127.0.0.1:80/debug/pprof/heap > mem_profile_coroot_node_agent_ldnxl_1.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 77389    0 77389    0     0  3285k      0 --:--:-- --:--:-- --:--:-- 3285k

@apetruhin
Copy link
Member

I don't see anything suspicious in this profile.
Could you please dump a few more from the pods that are currently consuming a large amount of memory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants