Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standalone installation not usable via Python SDK (unable to load root certificates) #2451

Closed
garymm opened this issue Nov 7, 2024 · 3 comments

Comments

@garymm
Copy link

garymm commented Nov 7, 2024

What happened?

Installed as per the instructions from the docs:

kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=v0.17.0"

Then used the katib python SDK as per the example in the docs.
Creating an experiment fails with:

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Internal error occurred: failed calling webhook \"defaulter.experiment.katib.kubeflow.org\": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block","reason":"InternalError","details":{"causes":[{"message":"failed calling webhook \"defaulter.experiment.katib.kubeflow.org\": could not get REST client: unable to load root certificates: unable to parse bytes as PEM block"}]},"code":500}

From some related thread on Slack I gather that the MutatingWebhookConfiguration having empty caBundle may be related:

kubectl get MutatingWebhookConfiguration katib.kubeflow.org -o yaml

Outputs:

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  annotations:
    cert-manager.io/inject-ca-from: kubeflow/katib-webhook-cert
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"admissionregistration.k8s.io/v1","kind":"MutatingWebhookConfiguration","metadata":{"annotations":{"cert-manager.io/inject-ca-from":"kubeflow/katib-webhook-cert"},"name":"katib.kubeflow.org"},"webhooks":[{"admissionReviewVersions":["v1"],"clientConfig":{"caBundle":"Cg==","service":{"name":"katib-controller","namespace":"kubeflow","path":"/mutate-experiment"}},"name":"defaulter.experiment.katib.kubeflow.org","rules":[{"apiGroups":["kubeflow.org"],"apiVersions":["v1beta1"],"operations":["CREATE","UPDATE"],"resources":["experiments"]}],"sideEffects":"None"},{"admissionReviewVersions":["v1"],"clientConfig":{"caBundle":"Cg==","service":{"name":"katib-controller","namespace":"kubeflow","path":"/mutate-pod"}},"name":"mutator.pod.katib.kubeflow.org","namespaceSelector":{"matchLabels":{"katib.kubeflow.org/metrics-collector-injection":"enabled"}},"objectSelector":{"matchExpressions":[{"key":"katib.kubeflow.org/metrics-collector-injection","operator":"NotIn","values":["disabled"]}]},"rules":[{"apiGroups":[""],"apiVersions":["v1"],"operations":["CREATE"],"resources":["pods"]}],"sideEffects":"None"}]}
  creationTimestamp: "2024-11-07T00:00:59Z"
  generation: 1
  name: katib.kubeflow.org
  resourceVersion: "4380064"
  uid: 2a1ab32a-a96e-4154-a58d-3271ad4bd21d
webhooks:
- admissionReviewVersions:
  - v1
  clientConfig:
    caBundle: Cg==
    service:
      name: katib-controller
      namespace: kubeflow
      path: /mutate-experiment
      port: 443
  failurePolicy: Fail
  matchPolicy: Equivalent
  name: defaulter.experiment.katib.kubeflow.org
  namespaceSelector: {}
  objectSelector: {}
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - kubeflow.org
    apiVersions:
    - v1beta1
    operations:
    - CREATE
    - UPDATE
    resources:
    - experiments
    scope: '*'
  sideEffects: None
  timeoutSeconds: 10
- admissionReviewVersions:
  - v1
  clientConfig:
    caBundle: Cg==
    service:
      name: katib-controller
      namespace: kubeflow
      path: /mutate-pod
      port: 443
  failurePolicy: Fail
  matchPolicy: Equivalent
  name: mutator.pod.katib.kubeflow.org
  namespaceSelector:
    matchLabels:
      katib.kubeflow.org/metrics-collector-injection: enabled
  objectSelector:
    matchExpressions:
    - key: katib.kubeflow.org/metrics-collector-injection
      operator: NotIn
      values:
      - disabled
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
    scope: '*'
  sideEffects: None
  timeoutSeconds: 10

What did you expect to happen?

I expect to be able to use the Python SDK after installing Katib standalone.

Environment

Kubernetes version:

$ kubectl version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.6

Katib controller version: 0.17.0

Katib Python SDK version: 0.17.0

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

@tenzen-y
Copy link
Member

tenzen-y commented Nov 7, 2024

It seems that the certification was not set to webhook configurations appropriately.
Could you check the controller state with kubectl get pods -n kubeflow?

@garymm
Copy link
Author

garymm commented Nov 7, 2024

Ah yeah the controller pod can't run because:

Events:
  Type     Reason       Age                    From     Message
  ----     ------       ----                   ----     -------
  Warning  FailedMount  3m11s (x527 over 17h)  kubelet  MountVolume.SetUp failed for volume "cert" : secret "katib-webhook-cert" not found

So it seems a secret needs to be created. Is it possible for the katib-standalone kube configs can handle this? If not then I guess instructions need to be added as to how the user can do this on their own before applying the kube configs.

@garymm
Copy link
Author

garymm commented Nov 7, 2024

Hmm re-applied and it seems to work now. Not sure what happened the first time. I will close and re-open if I can reproduce.

@garymm garymm closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants