Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sriov-network-metrics-exporter fails to deploy #766

Closed
ianb-mp opened this issue Aug 28, 2024 · 3 comments · Fixed by #770
Closed

sriov-network-metrics-exporter fails to deploy #766

ianb-mp opened this issue Aug 28, 2024 · 3 comments · Fixed by #770

Comments

@ianb-mp
Copy link
Contributor

ianb-mp commented Aug 28, 2024

I've enabled the new featureGate for metricsExporter (#655) however I see error in the operator log: DaemonSet in version \"v1\" cannot be handled as a DaemonSet: json: cannot unmarshal bool into Go struct field PodSpec.spec.template.spec.nodeSelector of type string"} - full error:

2024-08-28T00:52:03.971414176Z  ERROR   syncMetricsExporter     controllers/sriovoperatorconfig_controller.go:131       Couldn't sync metrics exporter objects  {"error": "failed to apply object &{map[apiVersion:apps/v1 kind:DaemonSet metadata:map[labels:map[app:sriov-network-metrics-exporter] name:sriov-network-metrics-exporter namespace:sriov-network-operator ownerReferences:[map[apiVersion:sriovnetwork.openshift.io/v1 blockOwnerDeletion:true controller:true kind:SriovOperatorConfig name:default uid:a131844f-34c3-4a41-a739-b1c51ff145d3]]] spec:map[selector:map[matchLabels:map[app:sriov-network-metrics-exporter]] template:map[metadata:map[labels:map[app:sriov-network-metrics-exporter]] spec:map[containers:[map[args:[--web.listen-address=127.0.0.1:9110 --path.kubecgroup=/sys/fs/cgroup --path.sysbuspci=/host/sys/bus/pci/devices/ --path.sysclassnet=/host/sys/class/net/ --path.cpucheckpoint=/host/cpu_manager_state --path.kubeletsocket=/host/kubelet.sock --collector.kubepoddevice=true --collector.vfstatspriority=netlink,sysfs] image:ghcr.io/k8snetworkplumbingwg/sriov-network-metrics-exporter:v1.1.0 imagePullPolicy:IfNotPresent name:metrics-exporter resources:map[requests:map[cpu:100m memory:100Mi]] securityContext:map[allowPrivilegeEscalation:false capabilities:map[drop:[ALL]] readOnlyRootFilesystem:true] volumeMounts:[map[mountPath:/host/kubelet.sock name:kubeletsocket] map[mountPath:/host/sys/bus/pci/devices name:sysbuspcidevices readOnly:true] map[mountPath:/host/sys/devices name:sysdevices readOnly:true] map[mountPath:/host/sys/class/net name:sysclassnet readOnly:true] map[mountPath:/host/cpu_manager_state name:cpucheckpoint readOnly:true]]] map[args:[--logtostderr --secure-listen-address=[$(HOST_IP)]:9110 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --upstream=http://127.0.0.1:9110/ --tls-private-key-file=/etc/metrics/tls.key --tls-cert-file=/etc/metrics/tls.crt] env:[map[name:HOST_IP valueFrom:map[fieldRef:map[fieldPath:status.hostIP]]]] image:gcr.io/kubebuilder/kube-rbac-proxy:v0.15.0 imagePullPolicy:IfNotPresent name:kube-rbac-proxy ports:[map[containerPort:9110 name:https-metrics]] resources:map[requests:map[cpu:10m memory:20Mi]] volumeMounts:[map[mountPath:/etc/metrics name:metrics-certs readOnly:true]]]] hostNetwork:true nodeSelector:map[feature.node.kubernetes.io/network-sriov.capable:true] restartPolicy:Always serviceAccountName:metrics-exporter-sa volumes:[map[hostPath:map[path:/var/lib/kubelet/pod-resources/kubelet.sock type:Socket] name:kubeletsocket] map[hostPath:map[path:/var/lib/kubelet/cpu_manager_state type:File] name:cpucheckpoint] map[hostPath:map[path:/sys/class/net type:Directory] name:sysclassnet] map[hostPath:map[path:/sys/bus/pci/devices type:Directory] name:sysbuspcidevices] map[hostPath:map[path:/sys/devices type:Directory] name:sysdevices] map[name:metrics-certs secret:map[defaultMode:420 secretName:metrics-exporter-cert]]]]]]]} with err: could not create (apps/v1, Kind=DaemonSet) sriov-network-operator/sriov-network-metrics-exporter: DaemonSet in version \"v1\" cannot be handled as a DaemonSet: json: cannot unmarshal bool into Go struct field PodSpec.spec.template.spec.nodeSelector of type string"}

SriovOperatorConfig is:

apiVersion: v1
items:
- apiVersion: sriovnetwork.openshift.io/v1
  kind: SriovOperatorConfig
  metadata:
    annotations:
      meta.helm.sh/release-name: sriov-network-operator
      meta.helm.sh/release-namespace: sriov-network-operator
    creationTimestamp: "2024-08-28T00:48:53Z"
    generation: 2
    labels:
      app.kubernetes.io/managed-by: Helm
    name: default
    namespace: sriov-network-operator
    resourceVersion: "20124753"
    uid: a131844f-34c3-4a41-a739-b1c51ff145d3
  spec:
    configDaemonNodeSelector:
      feature.node.kubernetes.io/network-sriov.capable: "true"
    configurationMode: daemon
    disableDrain: true
    enableInjector: false
    enableOperatorWebhook: false
    featureGates:
      metricsExporter: true
    logLevel: 1
kind: List
metadata:
  resourceVersion: ""

If I modify the node selector so the value is something other than "true" then the error goes away:

configDaemonNodeSelector:
  kubernetes.io/hostname: host1

However, I notice the sriov-network-metrics-exporter pod fails to start with error:

Warning  FailedMount  52s (x11 over 7m3s)  kubelet            MountVolume.SetUp failed for volume "metrics-certs" : secret "metrics-exporter-cert" not found   

fyi @zeeke

@ianb-mp
Copy link
Contributor Author

ianb-mp commented Aug 28, 2024

I realised the error about missing secret metrics-exporter-cert is due to the operator referencing that by default here.
Is it mandatory to supply a certificate in this way? (I don't recall needing to do anything with certs when deploying metrics exporter using upstream repo's manifest directly)

@zeeke
Copy link
Member

zeeke commented Aug 28, 2024

Hi @ianb-mp, ATM metrics are exported via a kube-rbac-proxy only through HTTPS. If you are interested in making this optional and having the metrics available through plain HTTP, I can bring this topic to the next community meeting.

Regarding the error:

                            json: cannot unmarshal bool into Go struct field PodSpec.spec.template.spec.nodeSelector of type string"}

I confirm it is a bug, will look for a fix

zeeke added a commit to zeeke/sriov-network-operator-1 that referenced this issue Aug 29, 2024
When using a node selector with boolean values, e.g.:
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
spec:
  configDaemonNodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
```

the value needs to be quoted before forwarding it to the metrics-exporter
node selector field.

Fixes k8snetworkplumbingwg#766

Signed-off-by: Andrea Panattoni <[email protected]>
zeeke added a commit to zeeke/sriov-network-operator-1 that referenced this issue Aug 29, 2024
When using a node selector with boolean values, e.g.:
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
spec:
  configDaemonNodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
```

the value needs to be quoted before forwarding it to the metrics-exporter
node selector field.

Fixes k8snetworkplumbingwg#766

Signed-off-by: Andrea Panattoni <[email protected]>
zeeke added a commit to zeeke/sriov-network-operator-1 that referenced this issue Aug 29, 2024
When using a node selector with boolean values, e.g.:
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
spec:
  configDaemonNodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
```

the value needs to be quoted before forwarding it to the metrics-exporter
node selector field.

Fixes k8snetworkplumbingwg#766

Signed-off-by: Andrea Panattoni <[email protected]>
@zeeke zeeke closed this as completed in #770 Sep 6, 2024
@porjo
Copy link

porjo commented Jan 13, 2025

According to this comment:

When no cert is specified but a secure listen address is passed as a flag then kube rbac proxy generates a self signed cert at startup.

Rather than failing to deploy, generating a self-signed cert seems like a better default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants