Skip to content

Failed to initialize NVML: could not load NVML library. #36

@zbjjyy

Description

@zbjjyy

ENV :

K8s : v1.23.10
Runtime: docker 20.10.8
NVIDIA System Management Interface -- v535.161.07
Image: 4pdosc/k8s-device-plugin:v0.10.0.4-ubuntu20.04

Issue:

after deploy the plugin ds ,the logs shows:

2024/03/27 15:41:13 Loading PciInfo

 0 = 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)

 1 = 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]

 2 = 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]

 3 = 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)

 4 = 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)

 5 = 00:02.0 VGA compatible controller: Cirrus Logic GD 5446

 6 = 00:03.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge

 7 = 00:04.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge

 8 = 00:05.0 Ethernet controller: Red Hat, Inc. Virtio network device

 9 = 00:06.0 Audio device: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) High Definition Audio Controller (rev 01)

 10 = 00:07.0 SCSI storage controller: Red Hat, Inc. Virtio block device

 11 = 00:08.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)

 found 00:08.0

 12 = 00:09.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon

 13 = 

 pcibusstr= 00:08.0


 2024/03/27 15:41:13 Loading NVML

 2024/03/27 15:41:13 Failed to initialize NVML: could not load NVML library.

 2024/03/27 15:41:13 If this is a GPU node, did you set the docker default runtime to `nvidia`?

 2024/03/27 15:41:13 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites

 2024/03/27 15:41:13 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start

 2024/03/27 15:41:13 If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes

  1. I have checked the env, and nvidia-smi works on the vm
root@master:/usr/local/vgpu# nvidia-smi 
Wed Mar 27 15:46:02 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-32GB           Off | 00000000:00:08.0 Off |                    0 |
| N/A   31C    P0              23W / 300W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions