Replies: 36 comments 1 reply
-
Hi @yeeahnick can you copy/paste the result of a curl on http://ip:61208/api/4/full ? Thanks. |
Beta Was this translation helpful? Give feedback.
-
Hi @nicolargo Thanks for the quick response. Unfortunately the curl /full no longer shows the NVIDIA gpu (same thing under file system in Glances). There was a TrueNAS Scale update (24.10.2) yesterday that included NVIDIA fixes which I guess made it worst for Glances. To be clear my GPU is working in other dockers running on the same system. But I can give more information. When I run "ls /dev | grep nvidia" in the shell of Glances I see the following: nvidia-caps When I do a nvidia-smi nothing is found (this works in other dockers on the same system). When I run "env" in the shell of Glances I see that the NVIDIA capabilities and devices are enabled. (environment variables) When I run "glances | grep -i runtime" in the shell of Glances it just hangs. I will fiddle with it again tonight to see if I can repopulate the curl /full. Let me know if I need to provide anything else. Cheers! |
Beta Was this translation helpful? Give feedback.
-
In the shell of Glances, can you run the following command:
It will display the path to the glances.log file. then run:
And copy paste:
Thanks ! |
Beta Was this translation helpful? Give feedback.
-
Having the same issue here. Hope the info within the screenshot can help. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
You can run this "cat /tmp/glances-root.log" in the Glances shell to view the log file. |
Beta Was this translation helpful? Give feedback.
-
Same exact results here. I have also noticed that inside the container
from the root log:
However, I've got other containers on the system that are using the GPU no problem. Please let me know if you want to see any other parts of the log |
Beta Was this translation helpful? Give feedback.
-
Same with nvidia-smi / |
Beta Was this translation helpful? Give feedback.
-
Glances binds directly the libnvidia-ml.so.1 file. Check that this file is available on your system.
The folder where this file is located should be added to LD_LIBRARY_PATH. So long story short it's more a TrueNAS integration issue than a Glances bug. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Bonjour et merci de vous impliquer avec ce problème. Is there something that can be done with the glances container to fix this? I doubt TrueNAS will take a look at this since all my other dockers an community apps have a working GPU without doing anything special. (immich, plex, mkvtoolnix, dashdot, etc..) I set LD_LIBRARY_PATH to /usr/lib/x86_64-linux-gnu (also tried with /usr/lib64 ) as env variable on the container but it didn't change anything. Also did the same for LD_PRELOAD. I also tried the alpine-dev tags but got the same results (I do see more info like IP) . I also tried with the official truenas community app but that one doesn't support GPUs at all. Here is the result of that command in the Glances shell:
Here is the result of that command in the TrueNAS shell:
|
Beta Was this translation helpful? Give feedback.
-
I'm not on TrueNAS actually. I'm on Proxmox, which is a fork of debian. libnvidia-ml.so is found on both my host, and in the container. From the container: /usr/lib64
I assume that you meant taht the env var must be set to the path in the container, otherwise it would also be necessary to mount the file from the host. Setting it to /usr/lib64 doesn't make a difference |
Beta Was this translation helpful? Give feedback.
-
Hi, Can we change the label "needs more info" to "need investigation". A few of us provided a lot of info and we all have the same results and issue. TrueNAS and Proxmox are affected. Thanks |
Beta Was this translation helpful? Give feedback.
-
@nicolargo it looks like you are swallowing the original exception (https://github.com/nicolargo/glances/blob/develop/glances/plugins/gpu/cards/nvidia.py#L32), but if I shell into the container and run the code out of nvidia.py, I get this error:
the double slash is strange to me. The file exists here
|
Beta Was this translation helpful? Give feedback.
-
Id do not have any computer with a NVIDIA card, so i need you help to investigate. Can you open a Python shell and enter the following command:
Please copy/paste the result. Thanks. |
Beta Was this translation helpful? Give feedback.
-
@nicolargo I can confirm what @kbirger said is accurate. The ubuntu-latest-full detects my P4000 no problem. Looks like the issue is within the latest-full (Alpine). |
Beta Was this translation helpful? Give feedback.
-
I also switched to the Ubuntu based container instead of the alpine one and things started to work for me. Note: I'm on Ubuntu server 24.04 and not proxmox or truenas. But the alpine image does not work for monitoring gpu for me... Nvidia gpu. Ubuntu based container works fine. |
Beta Was this translation helpful? Give feedback.
-
I can not reproduce the issue on my side, so i need you to investigate. First of all, identify the current NVidia lib with the current command:
It should return one file with a full path. With the output of the previous command (do not forget the * at the end):
It should return a minimum of 2 files (on is a symbolic link and another one is the target file). For each line from the second command:
Please copy/paste all the results. |
Beta Was this translation helpful? Give feedback.
-
Like this?
|
Beta Was this translation helpful? Give feedback.
-
Nope. You need to add the * to also see the file targeted by the symbolic link
And apply the CDLL command on each file: Some thing like that:
Thanks ! |
Beta Was this translation helpful? Give feedback.
-
Sorry if I am doing it wrong again. Here is what I got this time.
|
Beta Was this translation helpful? Give feedback.
-
It's strange. The following line
should return 2 files:
and
Can you also copy paste:
|
Beta Was this translation helpful? Give feedback.
-
Here you go and thanks for looking into this. FYI the ubuntu-latest-full that is working also only shows 1 file.
|
Beta Was this translation helpful? Give feedback.
-
So the file exist but can not be loaded as a proper lib... What's the libnvidia-ml version on your working Ubuntu image ? (the same than the Alpine: 550.127.05) ? |
Beta Was this translation helpful? Give feedback.
-
Yes same version but location is '/usr/lib/x86_64-linux-gnu' |
Beta Was this translation helpful? Give feedback.
-
Sorry if this is noise in the thread, but I'll add that I was having the same issues on a Debian server, using Glances in Docker. Glances only saw the AMD graphics integrated in the CPU. I switched from the |
Beta Was this translation helpful? Give feedback.
-
I can confirm that this worked for me as well on Pop OS. Nothing else enabled me to see the NVIDIA GPUs |
Beta Was this translation helpful? Give feedback.
-
I was also having this problem on TrueNAS Scale 24.10 with a Quadro P620 installed, but after switching my docker-compose file to use the |
Beta Was this translation helpful? Give feedback.
-
Adding my config to the mix. I'm running glances through Docker on an Ubuntu 22.04.4 machine. Host Machine: Ubuntu 22.04.4 Working docker-compose file:
Not working (lifted directly from the docs):
I noticed some interesting things while getting my GPU to finally work, some of them might be docker/portainer specific issues or just something weird I did on my server. I noticed that including the deploy block prevented me from assigning my GPU to the container so I removed it in favor of assigning it in Portainer. This issue happened with both the Ubuntu & Alpine full images, it's almost certainly something on my system. Some notes that might help debug this further:
|
Beta Was this translation helpful? Give feedback.
-
Chatgpt tells me that Alpine does not support Nvidia drivers out of the box. With that in mind the missing Nvidia Lib is probably normal. Hence why the GPU only works with the Ubuntu image. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm encountering an issue where my NVIDIA Quadro P4000 is not being detected by Glances. I'm using the docker-compose (latest-full) configuration and have enabled NVIDIA GPU support in the application settings while building the app in TrueNAS. This configuration sets the NVIDIA_VISIBLE_DEVICES and NVIDIA_DRIVER_CAPABILITIES variables.
With these settings, I can see the NVIDIA driver listed under the file system pane in Glances, but the GPU does not appear when I access the endpoint:
http://IP:61208/api/4/gpu.
Interestingly, when I navigate to http://IP:61208/api/4/full, I can see several NVIDIA-related entries.
To ensure the GPU is properly assigned in the Docker Compose configuration, I ran the following command in the TrueNAS shell:
midclt call -job app.update glances-custom '{"values": {"resources": {"gpus": {"use_all_gpus": false, "nvidia_gpu_selection": {"PCI_SLOT": {"use_gpu": true, "uuid": "GPU-95943d54-8d67-b91e-00cb-ca3662cfd863"}}}}}}'
Despite this, the GPU still doesn’t show up in the /gpu endpoint.
Does anyone have suggestions or insights on what might be missing or misconfigured? Any help would be greatly appreciated!
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions