-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Memory leaking for Nvitop instances inside docker container #128
Comments
Hi @kenvix, thanks for raising this. I tested it locally, but I cannot reproduce this. I use a script to create and terminate 10k processes on the GPU. import time
import ray
import torch
@ray.remote(num_cpus=1, num_gpus=0.1)
def request_gpu():
torch.zeros(1000, device='cuda')
time.sleep(10)
ray.init()
_ = ray.get([request_gpu.remote() for _ in range(10000)]) The memory consumption of |
Hi, @XuehaiPan The test code you provided does not seem relevant to this issue. In my case, using tmux or screen to keep nvitop running, you will find that the memory (RAM, not GPU VRAM) usage of nvitop itself will continue to slowly increase over time. For my example below, I ran it for 12 hours: It used about 4.5G RAM |
@kenvix could you test Running after 2 days: |
similar problem here. latest version 1.3.2 with nvidia 560.35.03 driver version. approx ram usage for nvitop is whopping 30GB. This happened btw after I installed Ubuntu 24.10. Before that everything was good. Strangley the starting time for nvitop went up to 30sec (!) and during the starting it allocates those 30GB of memory into the ram. I suspect some faulty library... |
Thanks for replying ! Here is the output:
|
btw. this did not result in excessive memory use. |
Hey @alexanderfrey, thanks for the report. Could you change the value of |
@alexanderfrey I have the same trouble running nvitop and nvidia-smi. They eat 52GB of RAM and take 30s to start up. I'm running Ubuntu 24.10 and an RTX 3060 with the 560.35.05 driver. I've discovered that this is an issue with the nvidia-persistenced service. If you I thought I'd drop a note here so anyone seeing this will no the issue isn't with nvitop. |
Required prerequisites
What version of nvitop are you using?
1.3.2
Operating system and version
Ubuntu 22.04.4 LTS
NVIDIA driver version
535.104.12
NVIDIA-SMI
Python environment
3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] linux
nvidia-ml-py==12.535.133
nvitop==1.3.2
Problem description
Nvitop has memory leaking issue for instances inside docker container. (RAM, not VRAM). Even the operating system takes ten seconds to reclaim memory after SIGKILL a process.
Steps to Reproduce
Just keep running
nvtop
about few months. You'll seenvtop
consumed a lot of system memory. About 300GB in 77 days for my instance.Is this caused by nvitop recorded too much vRAM and GPU utilization information but not releasing it?
Traceback
No response
Logs
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: