Skip to content

CDI would be out-of-date when the device resources is allocated #633

@cyclinder

Description

@cyclinder

What happened?

the CDI file would be out-of-date when the device resources are allocated.

I0306 11:20:42.972829       1 server.go:127] Allocate() called with &AllocateRequest{ContainerRequests:[]*ContainerAllocateRequest{&ContainerAllocateRequest{DevicesIDs:[0000:0b:00.2],},},}
I0306 11:20:42.972935       1 pool_stub.go:108] GetEnvs(): for devices: [0000:0b:00.2]
I0306 11:20:42.978145       1 server.go:159] AllocateResponse send: &AllocateResponse{ContainerResponses:[]*ContainerAllocateResponse{&ContainerAllocateResponse{Envs:map[string]string{PCIDEVICE_SPIDERNET_IO_SRIVO_ETH_P4: 0000:0b:00.2,PCIDEVICE_SPIDERNET_IO_SRIVO_ETH_P4_INFO: {"0000:0b:00.2":{"generic":{"deviceID":"0000:0b:00.2"},"rdma":{"rdma_cm":"/dev/infiniband/rdma_cm","umad":"/dev/infiniband/umad10","uverbs":"/dev/infiniband/uverbs10"}}},},Mounts:[]*Mount{},Devices:[]*DeviceSpec{},Annotations:map[string]string{cdi.k8s.io/spidernet.io_net-pci: spidernet.io/net-pci=0000:0b:00.2,},CDIDevices:[]*CDIDevice{},},},}

root@10-20-1-50:/var/run/cdi# cat sriov-dp-spidernet.io-net-pci-srivo_eth_p4.yaml | grep 'name: '
  name: 0000:0b:00.6
  name: 0000:0b:00.7
  name: 0000:0b:01.0
  name: 0000:0b:01.1
  name: 0000:0b:00.2
  name: 0000:0b:00.3
  name: 0000:0b:00.4
  name: 0000:0b:00.5


root@10-20-1-50:/var/run/cdi# kubectl get sriovnetworknodepolicies.sriovnetwork.openshift.io -n spiderpool cx5-p4 -o yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"sriovnetwork.openshift.io/v1","kind":"SriovNetworkNodePolicy","metadata":{"annotations":{},"creationTimestamp":"2024-10-14T08:50:06Z","generation":1,"name":"cx5-p4","namespace":"spiderpool","resourceVersion":"29963475","uid":"a7351b59-2edd-459a-9e11-dac37c46f9c7"},"spec":{"deviceType":"netdevice","isRdma":true,"nicSelector":{"rootDevices":["0000:0b:00.0"],"vendor":"15b3"},"nodeSelector":{"kubernetes.io/os":"linux"},"numVfs":8,"priority":99,"resourceName":"srivo_eth_p4"}}
  creationTimestamp: "2024-11-25T12:27:48Z"
  generation: 1
  name: cx5-p4
  namespace: spiderpool
  resourceVersion: "69518810"
  uid: aece0fd0-8f1f-4516-b7b5-91fa9da39ae1
spec:
  deviceType: netdevice
  isRdma: true
  nicSelector:
    rootDevices:
    - 0000:0b:00.0
    vendor: 15b3
  nodeSelector:
    kubernetes.io/os: linux
  numVfs: 8
  priority: 99
  resourceName: srivo_eth_p4

What did you expect to happen?

I have 8 VFs for resource cx5-p4, you can see the CDI file below(sriov-dp-spidernet.io-net-pci-srivo_eth_p4.yaml )

When one of the resource devices(0000:0b:00.2) is allocated to a pod, the CDI file should only show 7 deviceNodes.

What are the minimal steps needed to reproduce the bug?

Anything else we need to know?

It seems we ignore #576 (comment)

Component Versions

Please fill in the below table with the version numbers of components used.

Component Version
SR-IOV Network Device Plugin
SR-IOV CNI Plugin
Multus
Kubernetes
OS

Config Files

Config file locations may be config dependent.

Device pool config file location (Try '/etc/pcidp/config.json')
Multus config (Try '/etc/cni/multus/net.d')
CNI config (Try '/etc/cni/net.d/')
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)
Kubeconfig file
SR-IOV Network Custom Resource Definition

Logs

SR-IOV Network Device Plugin Logs (use kubectl logs $PODNAME)
Multus logs (If enabled. Try '/var/log/multus.log' )
Kubelet logs (journalctl -u kubelet)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions