Skip to content

PodDefaults manifests incorrectly use vars for Certificate CR #157

@kimwnasptd

Description

@kimwnasptd

Checks

Kubeflow Version

Dashboard V2, after the renaming, in master branch

Kubeflow Platform

any

Kubernetes Distribution

any

Kubernetes Version

any

Description

We bumped into this while working on the migration script #154 (comment)

We had initially PR #94 which aimed to replace vars with replacements, since vars are being deprecated in kustomize. But we missed to also update the Certificate CR in the overlays/cert-manager that is responsible for the PodDefault Webhook's certificate (used by K8s to talk to the webhook with https)

commonName: $(podDefaultsServiceName).$(podDefaultsNamespace).svc
dnsNames:
- $(podDefaultsServiceName).$(podDefaultsNamespace).svc
- $(podDefaultsServiceName).$(podDefaultsNamespace).svc.cluster.local

This results in the created certificate to have a Common Name of $(podDefaultsServiceName).$(podDefaultsNamespace).svc (the vars are not updated), and thus K8s fails to talk with https as the common name doesn't match the DNS name poddefaults-webhook-service.kubeflow.svc.

Relevant Logs

statefulset/test: create Pod test-0 in StatefulSet test failed error: Internal error occurred: failed calling webhook "deployment.kubeflow.org": failed to call webhook: Post "https://poddefaults-webhook-service.kubeflow.svc:443/apply-poddefault?timeout=10s": tls: failed to verify certificate: x509: certificate is valid for $(podDefaultsServiceName).$(podDefaultsNamespace).svc, $(podDefaultsServiceName).$(podDefaultsNamespace).svc.cluster.local, not poddefaults-webhook-service.kubeflow.svc

Metadata

Metadata

Assignees

Labels

kind/bugkind - things not working properlypriority/needs-triagepriority - needs to be triaged

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions