Skip to content

sriov-network-metrics-exporter fails to deploy #766

Closed
@ianb-mp

Description

@ianb-mp

I've enabled the new featureGate for metricsExporter (#655) however I see error in the operator log: DaemonSet in version \"v1\" cannot be handled as a DaemonSet: json: cannot unmarshal bool into Go struct field PodSpec.spec.template.spec.nodeSelector of type string"} - full error:

2024-08-28T00:52:03.971414176Z  ERROR   syncMetricsExporter     controllers/sriovoperatorconfig_controller.go:131       Couldn't sync metrics exporter objects  {"error": "failed to apply object &{map[apiVersion:apps/v1 kind:DaemonSet metadata:map[labels:map[app:sriov-network-metrics-exporter] name:sriov-network-metrics-exporter namespace:sriov-network-operator ownerReferences:[map[apiVersion:sriovnetwork.openshift.io/v1 blockOwnerDeletion:true controller:true kind:SriovOperatorConfig name:default uid:a131844f-34c3-4a41-a739-b1c51ff145d3]]] spec:map[selector:map[matchLabels:map[app:sriov-network-metrics-exporter]] template:map[metadata:map[labels:map[app:sriov-network-metrics-exporter]] spec:map[containers:[map[args:[--web.listen-address=127.0.0.1:9110 --path.kubecgroup=/sys/fs/cgroup --path.sysbuspci=/host/sys/bus/pci/devices/ --path.sysclassnet=/host/sys/class/net/ --path.cpucheckpoint=/host/cpu_manager_state --path.kubeletsocket=/host/kubelet.sock --collector.kubepoddevice=true --collector.vfstatspriority=netlink,sysfs] image:ghcr.io/k8snetworkplumbingwg/sriov-network-metrics-exporter:v1.1.0 imagePullPolicy:IfNotPresent name:metrics-exporter resources:map[requests:map[cpu:100m memory:100Mi]] securityContext:map[allowPrivilegeEscalation:false capabilities:map[drop:[ALL]] readOnlyRootFilesystem:true] volumeMounts:[map[mountPath:/host/kubelet.sock name:kubeletsocket] map[mountPath:/host/sys/bus/pci/devices name:sysbuspcidevices readOnly:true] map[mountPath:/host/sys/devices name:sysdevices readOnly:true] map[mountPath:/host/sys/class/net name:sysclassnet readOnly:true] map[mountPath:/host/cpu_manager_state name:cpucheckpoint readOnly:true]]] map[args:[--logtostderr --secure-listen-address=[$(HOST_IP)]:9110 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --upstream=http://127.0.0.1:9110/ --tls-private-key-file=/etc/metrics/tls.key --tls-cert-file=/etc/metrics/tls.crt] env:[map[name:HOST_IP valueFrom:map[fieldRef:map[fieldPath:status.hostIP]]]] image:gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0 imagePullPolicy:IfNotPresent name:kube-rbac-proxy ports:[map[containerPort:9110 name:https-metrics]] resources:map[requests:map[cpu:10m memory:20Mi]] volumeMounts:[map[mountPath:/etc/metrics name:metrics-certs readOnly:true]]]] hostNetwork:true nodeSelector:map[feature.node.kubernetes.io/network-sriov.capable:true] restartPolicy:Always serviceAccountName:metrics-exporter-sa volumes:[map[hostPath:map[path:/var/lib/kubelet/pod-resources/kubelet.sock type:Socket] name:kubeletsocket] map[hostPath:map[path:/var/lib/kubelet/cpu_manager_state type:File] name:cpucheckpoint] map[hostPath:map[path:/sys/class/net type:Directory] name:sysclassnet] map[hostPath:map[path:/sys/bus/pci/devices type:Directory] name:sysbuspcidevices] map[hostPath:map[path:/sys/devices type:Directory] name:sysdevices] map[name:metrics-certs secret:map[defaultMode:420 secretName:metrics-exporter-cert]]]]]]]} with err: could not create (apps/v1, Kind=DaemonSet) sriov-network-operator/sriov-network-metrics-exporter: DaemonSet in version \"v1\" cannot be handled as a DaemonSet: json: cannot unmarshal bool into Go struct field PodSpec.spec.template.spec.nodeSelector of type string"}

SriovOperatorConfig is:

apiVersion: v1
items:
- apiVersion: sriovnetwork.openshift.io/v1
  kind: SriovOperatorConfig
  metadata:
    annotations:
      meta.helm.sh/release-name: sriov-network-operator
      meta.helm.sh/release-namespace: sriov-network-operator
    creationTimestamp: "2024-08-28T00:48:53Z"
    generation: 2
    labels:
      app.kubernetes.io/managed-by: Helm
    name: default
    namespace: sriov-network-operator
    resourceVersion: "20124753"
    uid: a131844f-34c3-4a41-a739-b1c51ff145d3
  spec:
    configDaemonNodeSelector:
      feature.node.kubernetes.io/network-sriov.capable: "true"
    configurationMode: daemon
    disableDrain: true
    enableInjector: false
    enableOperatorWebhook: false
    featureGates:
      metricsExporter: true
    logLevel: 1
kind: List
metadata:
  resourceVersion: ""

If I modify the node selector so the value is something other than "true" then the error goes away:

configDaemonNodeSelector:
  kubernetes.io/hostname: host1

However, I notice the sriov-network-metrics-exporter pod fails to start with error:

Warning  FailedMount  52s (x11 over 7m3s)  kubelet            MountVolume.SetUp failed for volume "metrics-certs" : secret "metrics-exporter-cert" not found   

fyi @zeeke

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions