-
Notifications
You must be signed in to change notification settings - Fork 98
Open
Description
ENV :
K8s : v1.23.10
Runtime: docker 20.10.8
NVIDIA System Management Interface -- v535.161.07
Image: 4pdosc/k8s-device-plugin:v0.10.0.4-ubuntu20.04
Issue:
after deploy the plugin ds ,the logs shows:
2024/03/27 15:41:13 Loading PciInfo
0 = 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
1 = 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2 = 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
3 = 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
4 = 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
5 = 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
6 = 00:03.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
7 = 00:04.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
8 = 00:05.0 Ethernet controller: Red Hat, Inc. Virtio network device
9 = 00:06.0 Audio device: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) High Definition Audio Controller (rev 01)
10 = 00:07.0 SCSI storage controller: Red Hat, Inc. Virtio block device
11 = 00:08.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
found 00:08.0
12 = 00:09.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
13 =
pcibusstr= 00:08.0
2024/03/27 15:41:13 Loading NVML
2024/03/27 15:41:13 Failed to initialize NVML: could not load NVML library.
2024/03/27 15:41:13 If this is a GPU node, did you set the docker default runtime to `nvidia`?
2024/03/27 15:41:13 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2024/03/27 15:41:13 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
2024/03/27 15:41:13 If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes
- I have checked the env, and
nvidia-smi
works on the vm
root@master:/usr/local/vgpu# nvidia-smi
Wed Mar 27 15:46:02 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-32GB Off | 00000000:00:08.0 Off | 0 |
| N/A 31C P0 23W / 300W | 0MiB / 32768MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Metadata
Metadata
Assignees
Labels
No labels