parameter devicePlugin.deviceSplitCount does not work

i use helm to install k8s-vgpu-scheduler, set devicePlugin.deviceSplitCount = 5. after deployed successfully, i run 'kubectl describe node <gpu node>', i can see the allocatable resources 'nvidia.com/gpu' count 40 (it has 8 A40 card in machine). Then i create 6 pod, every pod  assign 1 'nvidia.com/gpu'， but when i create a pod which needs 3 'nvidia.com/gpu'，the k8s said the pod can't not be schedulerd. 

the logs of  vgpu-scheduler is showed below, it seems said only 2 gpu card can usable？
![image](https://github.com/4paradigm/k8s-vgpu-scheduler/assets/138634190/d5ffc75f-6191-4d3d-a92a-b6439dbf26cb)
`I0313 00:58:35.594437       1 score.go:65] "devices status"
I0313 00:58:35.594467       1 score.go:67] "device status" device id="GPU-0707087e-8264-4ba4-bc45-30c70272ec4a" device detail={"Id":"GPU-0707087e-8264-4ba4-bc45-30c70272ec4a","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":46068,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-NVIDIA A40","Health":true}
I0313 00:58:35.594519       1 score.go:67] "device status" device id="GPU-b3e35ad4-81ee-0aee-9865-4787748b93ce" device detail={"Id":"GPU-b3e35ad4-81ee-0aee-9865-4787748b93ce","Index":1,"Used":0,"Count":10,"Usedmem":0,"Totalmem":46068,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-NVIDIA A40","Health":true}
I0313 00:58:35.594542       1 score.go:67] "device status" device id="GPU-d38a391c-9f2f-395e-2f91-1785a648f6c4" device detail={"Id":"GPU-d38a391c-9f2f-395e-2f91-1785a648f6c4","Index":2,"Used":1,"Count":10,"Usedmem":46068,"Totalmem":46068,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-NVIDIA A40","Health":true}
I0313 00:58:35.594568       1 score.go:67] "device status" device id="GPU-7099a282-5a75-55f8-0cd0-a4b48098ae1e" device detail={"Id":"GPU-7099a282-5a75-55f8-0cd0-a4b48098ae1e","Index":3,"Used":1,"Count":10,"Usedmem":46068,"Totalmem":46068,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-NVIDIA A40","Health":true}
I0313 00:58:35.594600       1 score.go:67] "device status" device id="GPU-56967eb2-30b7-c808-367a-225b8bd8a12e" device detail={"Id":"GPU-56967eb2-30b7-c808-367a-225b8bd8a12e","Index":4,"Used":1,"Count":10,"Usedmem":46068,"Totalmem":46068,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-NVIDIA A40","Health":true}
I0313 00:58:35.594639       1 score.go:67] "device status" device id="GPU-54191405-e5a9-2f7b-8ac4-f4e86c6669cb" device detail={"Id":"GPU-54191405-e5a9-2f7b-8ac4-f4e86c6669cb","Index":5,"Used":1,"Count":10,"Usedmem":46068,"Totalmem":46068,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-NVIDIA A40","Health":true}
I0313 00:58:35.594671       1 score.go:67] "device status" device id="GPU-e731cd15-879f-6d00-485d-d1b468589de9" device detail={"Id":"GPU-e731cd15-879f-6d00-485d-d1b468589de9","Index":6,"Used":1,"Count":10,"Usedmem":46068,"Totalmem":46068,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-NVIDIA A40","Health":true}
I0313 00:58:35.594693       1 score.go:67] "device status" device id="GPU-865edbf8-5d63-8e57-5e14-36682179eaf6" device detail={"Id":"GPU-865edbf8-5d63-8e57-5e14-36682179eaf6","Index":7,"Used":1,"Count":10,"Usedmem":46068,"Totalmem":46068,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-NVIDIA A40","Health":true}
I0313 00:58:35.594725       1 score.go:90] "Allocating device for container request" pod="default/gpu-pod-2" card request={"Nums":5,"Type":"NVIDIA","Memreq":0,"MemPercentagereq":100,"Coresreq":0}
I0313 00:58:35.594757       1 score.go:93] "scoring pod" pod="default/gpu-pod-2" Memreq=0 MemPercentagereq=100 Coresreq=0 Nums=5 device index=7 device="GPU-b3e35ad4-81ee-0aee-9865-4787748b93ce"
I0313 00:58:35.594800       1 score.go:140] "first fitted" pod="default/gpu-pod-2" device="GPU-b3e35ad4-81ee-0aee-9865-4787748b93ce"
I0313 00:58:35.594829       1 score.go:93] "scoring pod" pod="default/gpu-pod-2" Memreq=0 MemPercentagereq=100 Coresreq=0 Nums=4 device index=6 device="GPU-0707087e-8264-4ba4-bc45-30c70272ec4a"
I0313 00:58:35.594850       1 score.go:140] "first fitted" pod="default/gpu-pod-2" device="GPU-0707087e-8264-4ba4-bc45-30c70272ec4a"
I0313 00:58:35.594869       1 score.go:93] "scoring pod" pod="default/gpu-pod-2" Memreq=0 MemPercentagereq=100 Coresreq=0 Nums=3 device index=5 device="GPU-865edbf8-5d63-8e57-5e14-36682179eaf6"
I0313 00:58:35.594889       1 score.go:93] "scoring pod" pod="default/gpu-pod-2" Memreq=0 MemPercentagereq=100 Coresreq=0 Nums=3 device index=4 device="GPU-e731cd15-879f-6d00-485d-d1b468589de9"
I0313 00:58:35.594911       1 score.go:93] "scoring pod" pod="default/gpu-pod-2" Memreq=0 MemPercentagereq=100 Coresreq=0 Nums=3 device index=3 device="GPU-54191405-e5a9-2f7b-8ac4-f4e86c6669cb"
I0313 00:58:35.594929       1 score.go:93] "scoring pod" pod="default/gpu-pod-2" Memreq=0 MemPercentagereq=100 Coresreq=0 Nums=3 device index=2 device="GPU-56967eb2-30b7-c808-367a-225b8bd8a12e"
I0313 00:58:35.594948       1 score.go:93] "scoring pod" pod="default/gpu-pod-2" Memreq=0 MemPercentagereq=100 Coresreq=0 Nums=3 device index=1 device="GPU-7099a282-5a75-55f8-0cd0-a4b48098ae1e"
I0313 00:58:35.594966       1 score.go:93] "scoring pod" pod="default/gpu-pod-2" Memreq=0 MemPercentagereq=100 Coresreq=0 Nums=3 device index=0 device="GPU-d38a391c-9f2f-395e-2f91-1785a648f6c4"
I0313 00:58:35.594989       1 score.go:211] "calcScore:node not fit pod" pod="default/gpu-pod-2" node="gpu-230"`


the kubectl describe node gpu-230 said:
![image](https://github.com/4paradigm/k8s-vgpu-scheduler/assets/138634190/f21cd95b-e1a5-47a5-b43e-639c49fa980c)

the nvidia-smi said:
![image](https://github.com/4paradigm/k8s-vgpu-scheduler/assets/138634190/f1ae5a3e-9d80-4bdf-9eee-d71ee297b61d)

so somebody can solve this issue? thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

parameter devicePlugin.deviceSplitCount does not work #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

parameter devicePlugin.deviceSplitCount does not work #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions