Skip to content

KAAP-784: Add exclud-node-drain annotation on the Machine if last node while deauth/decom #904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

vaibhavd21
Copy link

Fixes KAAP-784:

If the node is the last node in the k8s cluster and when we try to deauth that, the CAPI has changed some policy in draining that node. This is applicable since we have upgraded CAPI to 1.9.0 from June release.

Ref:
https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20240930-machine-drain-rules.md
https://github.com/kubernetes-sigs/cluster-api/releases/tag/v1.9.0

If the node is the last node in the cluster, applying annotation to exclude drain before we proceed with scaling down MD in deauth.

testing -

 ~/Documents  kubectl get nodes                                                                                                                                                                             ✔  oidc-login 󱃾  10:24:19 PM
NAME                STATUS   ROLES    AGE     VERSION
byoh-june-test      Ready    <none>   6h34m   v1.32.3
byoh-kaapi-test-1   Ready    <none>   4h26m   v1.32.3
 ~/Documents  kubectl get pods -A                                                                                                                                                                         1 ✘  oidc-login 󱃾  10:24:33 PM
NAMESPACE          NAME                                       READY   STATUS    RESTARTS        AGE
calico-apiserver   calico-apiserver-76f777c8cc-j7zq5          1/1     Running   1 (6h33m ago)   8h
calico-apiserver   calico-apiserver-76f777c8cc-pfnct          1/1     Running   1 (6h33m ago)   8h
calico-system      calico-kube-controllers-f9596bb99-nglp9    1/1     Running   0               8h
calico-system      calico-node-c76nk                          1/1     Running   0               4h26m
calico-system      calico-node-cc7hc                          1/1     Running   0               6h34m
calico-system      calico-typha-746f8f954c-wsnns              1/1     Running   0               6h44m
cert-manager       cert-manager-5bfdd59f99-d2fmp              1/1     Running   0               8h
cert-manager       cert-manager-cainjector-7c68db8899-5kjjt   1/1     Running   1 (6h33m ago)   8h
cert-manager       cert-manager-webhook-574866cddf-m9d5q      1/1     Running   0               8h
kube-system        coredns-796d84c46b-cm77q                   1/1     Running   0               8h
kube-system        coredns-796d84c46b-v9mwt                   1/1     Running   0               8h
kube-system        konnectivity-agent-96vfz                   1/1     Running   0               6h34m
kube-system        konnectivity-agent-9g8d5                   1/1     Running   0               4h26m
kube-system        metrics-server-54cf798c86-9qtl4            2/2     Running   0               8h
kube-system        pf9-kube-proxy-p2twv                       1/1     Running   0               6h34m
kube-system        pf9-kube-proxy-twgrn                       1/1     Running   0               4h26m
kube-system        vcp-proxy-2k98q                            1/1     Running   0               6h34m
kube-system        vcp-proxy-khrbw                            1/1     Running   0               4h26m
tigera-operator    tigera-operator-7b9dcd4cd7-pns48           1/1     Running   0               6h44m

Deauth of first node -

root@byoh-kaapi-test-1:~# ./byohctl deauthorise
[2025-06-16 16:57:59] [SUCCESS] Successfully retrieved Kubernetes client
[2025-06-16 16:57:59] [SUCCESS] Successfully retrieved ByoHosts object from the management plane
[2025-06-16 16:57:59] [SUCCESS] Successfully annotated machine object that needs to be removed from the cluster
[2025-06-16 16:57:59] [SUCCESS] Successfully scaled down machine deployment by 1
[2025-06-16 16:58:09] [SUCCESS] MachineRef successfully unset
[2025-06-16 16:58:09] [SUCCESS] machineRef successfully unset for the host
[2025-06-16 16:58:09] [SUCCESS] Successfully deauthorised host from the byo cluster

workload cluster status -

 ~/Documents  kubectl get nodes                                                                                                                                                                       ✔  29s  oidc-login 󱃾  10:26:43 PM
NAME             STATUS   ROLES    AGE     VERSION
byoh-june-test   Ready    <none>   6h39m   v1.32.3

Pods got shifted to the only remaining node -

~/Documents  kubectl get pods -A -o wide                                                                                                                                                              ✔  7s  oidc-login 󱃾  10:29:22 PM
NAMESPACE          NAME                                       READY   STATUS    RESTARTS        AGE     IP               NODE             NOMINATED NODE   READINESS GATES
calico-apiserver   calico-apiserver-76f777c8cc-j7zq5          1/1     Running   1 (6h38m ago)   8h      10.244.14.17     byoh-june-test   <none>           <none>
calico-apiserver   calico-apiserver-76f777c8cc-pfnct          1/1     Running   1 (6h38m ago)   8h      10.244.14.20     byoh-june-test   <none>           <none>
calico-system      calico-kube-controllers-f9596bb99-nglp9    1/1     Running   0               8h      10.244.14.13     byoh-june-test   <none>           <none>
calico-system      calico-node-cc7hc                          1/1     Running   0               6h39m   10.149.101.162   byoh-june-test   <none>           <none>
calico-system      calico-typha-746f8f954c-wsnns              1/1     Running   0               6h48m   10.149.101.162   byoh-june-test   <none>           <none>
cert-manager       cert-manager-5bfdd59f99-d2fmp              1/1     Running   0               8h      10.244.14.21     byoh-june-test   <none>           <none>
cert-manager       cert-manager-cainjector-7c68db8899-5kjjt   1/1     Running   1 (6h38m ago)   8h      10.244.14.19     byoh-june-test   <none>           <none>
cert-manager       cert-manager-webhook-574866cddf-m9d5q      1/1     Running   0               8h      10.244.14.18     byoh-june-test   <none>           <none>
kube-system        coredns-796d84c46b-cm77q                   1/1     Running   0               8h      10.244.14.15     byoh-june-test   <none>           <none>
kube-system        coredns-796d84c46b-v9mwt                   1/1     Running   0               8h      10.244.14.16     byoh-june-test   <none>           <none>
kube-system        konnectivity-agent-96vfz                   1/1     Running   0               6h39m   10.244.14.11     byoh-june-test   <none>           <none>
kube-system        metrics-server-54cf798c86-9qtl4            2/2     Running   0               8h      10.244.14.14     byoh-june-test   <none>           <none>
kube-system        pf9-kube-proxy-p2twv                       1/1     Running   0               6h39m   10.149.101.162   byoh-june-test   <none>           <none>
kube-system        vcp-proxy-2k98q                            1/1     Running   0               6h39m   10.149.101.162   byoh-june-test   <none>           <none>
tigera-operator    tigera-operator-7b9dcd4cd7-pns48           1/1     Running   0               6h48m   10.149.101.162   byoh-june-test   <none>           <none>

Deauth of second node -

root@byoh-june-test:~# ./byohctl deauthorise
[2025-06-16 17:16:49] [SUCCESS] Successfully retrieved Kubernetes client
[2025-06-16 17:16:50] [SUCCESS] Successfully retrieved ByoHosts object from the management plane
Info: Machine deployment replica count is 1. This is the last node in the cluster.
Do you want to continue with de-auth? (y/n): y
[2025-06-16 17:16:58] [SUCCESS] Successfully annotated machine object that needs to be removed from the cluster
[2025-06-16 17:16:59] [SUCCESS] Successfully scaled down machine deployment by 1
[2025-06-16 17:17:34] [SUCCESS] MachineRef unset
[2025-06-16 17:17:34] [SUCCESS] MachineRef successfully unset for the host
[2025-06-16 17:17:34] [SUCCESS] Successfully deauthorised host from the byo cluster

workload cluster status now with 0 nodes -

 ~/Documents  kubectl get nodes                                                                                                                                                                       ✔  39s  oidc-login 󱃾  10:45:18 PM
No resources found
 ~/Documents  kubectl get pods -A -o wide                                                                                                                                                              ✔  4s  oidc-login 󱃾  10:54:25 PM
NAMESPACE          NAME                                       READY   STATUS    RESTARTS   AGE     IP       NODE     NOMINATED NODE   READINESS GATES
calico-apiserver   calico-apiserver-76f777c8cc-5fss5          0/1     Pending   0          6m47s   <none>   <none>   <none>           <none>
calico-apiserver   calico-apiserver-76f777c8cc-sfthg          0/1     Pending   0          6m48s   <none>   <none>   <none>           <none>
calico-system      calico-kube-controllers-f9596bb99-g5c4n    0/1     Pending   0          6m48s   <none>   <none>   <none>           <none>
calico-system      calico-typha-746f8f954c-5s6bn              0/1     Pending   0          6m49s   <none>   <none>   <none>           <none>
cert-manager       cert-manager-5bfdd59f99-b4g6t              0/1     Pending   0          6m48s   <none>   <none>   <none>           <none>
cert-manager       cert-manager-cainjector-7c68db8899-gx286   0/1     Pending   0          6m48s   <none>   <none>   <none>           <none>
cert-manager       cert-manager-webhook-574866cddf-pc92m      0/1     Pending   0          6m48s   <none>   <none>   <none>           <none>
kube-system        coredns-796d84c46b-8k2mj                   0/1     Pending   0          6m49s   <none>   <none>   <none>           <none>
kube-system        coredns-796d84c46b-pgkh9                   0/1     Pending   0          6m49s   <none>   <none>   <none>           <none>
kube-system        metrics-server-54cf798c86-xpbqr            0/2     Pending   0          6m49s   <none>   <none>   <none>           <none>
tigera-operator    tigera-operator-7b9dcd4cd7-zdhcb           0/1     Pending   0          6m48s   <none>   <none>   <none>           <none>

Jayanth Reddy and others added 30 commits February 20, 2025 02:01
- bundled repo for 1.26 kube
- Sample yamls for byoh config with service account
- Webhook fix to allow sevice account user
- Fix to avoid same secret name for install & bootstrap
This reverts commit 8db3384.
* Added chart-generator

* Added Byoh-chart

* increased cpu and memory limit

* removed unnecessary fields from kustomization.yaml , changed comparision of userName and managerServiceAccount in byohost_webhook controller , changed cpu memory range , changed path in kustomizeconfig.yaml

* reverted kustomization.yaml

* changes added to pr

* reverted kustomization.yaml

* Added byoh and tested

* added validation for string len check byohost_webhook.go

* reverted changes of webhook

* changes in chart-generator.sh

* removed allowance of req v1.update

* added substrs len check

---------

Co-authored-by: Snehal Shelke <[email protected]>
…deb (#5)

---------

Co-authored-by: snslk <[email protected]>
Co-authored-by: Snehal Shelke <[email protected]>
* k8s bundle
---------

Co-authored-by: Snehal Shelke <[email protected]>
…-image-build-script

add build script for controller manager
snslk and others added 28 commits April 1, 2025 15:10
changed default kube-version to latest support version (1.32.2)
#23)

* KAAP-485: Support for De-auth and Decommission host from byoh cluster

---------

Co-authored-by: Vaibhav Dubewar <[email protected]>
…lector

Listing only hosts from same NS to attach
#32)

* kaap-553: Proceed with host cleanup in case of decommission even if byohost doesn't exist in mgmt plane

* consolidated func to perform cleanup when machineRef is unset

---------

Co-authored-by: Vaibhav Dubewar <[email protected]>
KAAP-537 : BYOH enhancements (Updated flags to be consistent with cloudctl, interactive password input)
…etion of byohost object when byoMachine does not exists in the cluster (#34)

Co-authored-by: Vaibhav Dubewar <[email protected]>
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.17.0 to 0.35.0.
- [Commits](golang/crypto@v0.17.0...v0.35.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-version: 0.35.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ost objects (#37)

* Allowing users of type <name>@<domain>.<tld> to patch byohost objects

* Added unit test for the same

* Deny request from unauthorized-user

---------

Co-authored-by: Vaibhav Dubewar <[email protected]>
@vmwclabot
Copy link

@vaibhavd21, you must sign our contributor license agreement before your changes are merged. Click here to sign the agreement. If you are a VMware employee, read this for further instruction.

@vaibhavd21 vaibhavd21 closed this Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants