operator: upgrade all control plane nodes first#3444
Conversation
39f28da to
5000cc9
Compare
14e650b to
5fcf88f
Compare
a612103 to
d0e8a38
Compare
| maxNodeBudget: | ||
| description: MaxNodeBudget is the maximum number of nodes that can | ||
| be created simultaneously. | ||
| format: int32 | ||
| type: integer |
There was a problem hiding this comment.
Is this problematic for upgrades, since we never upgrade the installed CRDs through helm?
There was a problem hiding this comment.
No, I think helm has some upgrade magic build in regarding CRDs. But this time we only add one field that is optional, we are fine. Proof: https://github.com/edgelesssys/constellation/actions/runs/11455280858/job/31871905456.
Or do you have a concrete error in mind, I'm missing?
There was a problem hiding this comment.
As far as I know, Helm simply doesn't do anything if a CRD already exists: https://helm.sh/docs/chart_best_practices/custom_resource_definitions/#method-1-let-helm-do-it-for-you
So if we change the CRDs, running helm apply won't actually apply these changes.
From what I can tell, this effectively makes the maxNodeBudget option in the nodeversion crd non-functional.
I'm assuming everything still works fine because we default to 1 if the value is not set in the CR
There was a problem hiding this comment.
Oh, thanks for the hint. Yes, the behavior is (hopefully) exactly the same. I think we are good for now then (also since the e2e test passed). I didn't want to advertise this feature anyway, but even if we do we should add this constraint.
In the future(tm), we likely want to do what others do (e.g., istio: https://istio.io/latest/docs/setup/upgrade/helm/#canary-upgrade-recommended).
operators/constellation-node-operator/controllers/nodeversion_controller.go
Show resolved
Hide resolved
operators/constellation-node-operator/controllers/scalinggroup_controller.go
Show resolved
Hide resolved
a8ec800 to
72f85f3
Compare
| PlaceholderControlPlaneScalingGroupName = "control-planes-id" | ||
| // PlaceholderWorkerScalingGroupName name of the worker scaling group used if upgrades are not yet supported. | ||
| PlaceholderWorkerScalingGroupName = "workers-id" | ||
| // ControlPlaneRoleLabel label used to identify control plane nodes. |
There was a problem hiding this comment.
| // ControlPlaneRoleLabel label used to identify control plane nodes. | |
| // ControlPlaneRoleLabel label used to identify control plane nodes. | |
| // https://kubernetes.io/docs/reference/labels-annotations-taints/#node-role-kubernetes-io-control-plane |
Nit, just to document that this is canonical.
| r.RLock() | ||
| defer r.RUnlock() |
There was a problem hiding this comment.
Does this need a write lock now?
| // KubernetesClusterVersion is the advertised Kubernetes version of the cluster. | ||
| KubernetesClusterVersion string `json:"kubernetesClusterVersion,omitempty"` | ||
| // MaxNodeBudget is the maximum number of nodes that can be created simultaneously. | ||
| MaxNodeBudget uint32 `json:"maxNodeBudget,omitempty"` |
There was a problem hiding this comment.
I'm against making this a user-facing feature for now, because we did not discuss its semantics sufficiently. What if the user sets it to 1000 - do we replace all control-plane nodes at once? Should we have different budgets for control planes and for workers? Could this be relative? Should we rather implement a different upgrade algorithm (3533 etc)? I'm also reminded of the evolution of https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.30/#rollingupdatedeployment-v1-apps.
Afaict this PR would be much smaller if we removed this feature and tried to find a different way to test it.
There was a problem hiding this comment.
do we replace all control-plane nodes at once?
Yes, I also tested it setting the budget to 5 when I had 3:2 nodes. But "at-once" only applies to the world how the operator sees it. The join service still forbids multiple control-plane nodes joining at the same time. Note that this is the scenario right after initializing a Constellation with >=3 control planes. Also replace means, adding the nodes first. So in theory you could go from 3 control planes to 6 since the operator only removes a node once the hand over is finished.
Should we have different budgets for control planes and for workers?
We might do that in the future, if a customer requires it or we think we need it.
Could this be relative?
I assume as in a percentage value. Sure, but this is more difficult/complex then setting the number of nodes.
Should we rather implement a different upgrade algorithm
I think the bug is orthogonal to this proposal, since I'm not reworking the operator replacement algorithm, which would be a large undertaking in my opinion.
I'm also reminded of the evolution of https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.30/#rollingupdatedeployment-v1-apps
I don't know the evolution, is there a summary somewhere of what happened?
Note that also the operator API is on version v1alpha1, so in my opinion, we don't have to provide any API stability guarantees between Constellation versions and we can completely change the whole upgrade process and APIs between Constellation versions.
Some of the current fields are not really user-facing in a sense that the user should directly use them. Changing the image reference requires 1. the image reference to be one of our images references 2. the measurements in the join config to match the image. Changing the k8s version requires a config map under that name and for the Constellation to upgrade correctly it has to contain the right set of components and patches.
Afaict this PR would be much smaller if we removed this feature and tried to find a different way to test it.
Then I'll have another try at the test for this, but this might take a bit of time before I get back to this.
72f85f3 to
f1fc488
Compare
f1fc488 to
ac0be72
Compare
ac0be72 to
72f85f3
Compare
eb20368 to
eb60fdc
Compare
b1e967b to
d5860bf
Compare
72f85f3 to
cb1e37a
Compare
3e22aa8 to
a2a25fd
Compare
cb1e37a to
89318fe
Compare
Co-Authored-By: Leonard Cohnen <[email protected]>
a2a25fd to
37900d9
Compare
Before we call out ot the cloud provider we check if there are still control plane nodes that are outdated (or donors). If there are, we don't create any worker nodes, even if we have the budget to do so.
89318fe to
a9a9d4b
Compare
Coverage report
|
Do you remember what the issue was there? We might want to make a ticket for it. |
Fwiw, this part was already merged as #3653. |
0839dbf to
bf26b4b
Compare
|
This was merged as #3663 |
Context
Allow #3396. Since kubelets must not communicate with a KubeAPI Server that has an older version than the kubelet itself, we need to upgrade all control planes, before upgrading the worker nodes. Control plane nodes are configured in a way that they only talk to the local KubeAPI server that matches the kubelet version.
Proposed change(s)
How to test:
Related issue
kubernetes/kubernetes#127316
Checklist
gcp-snp, 3:2, v1.30.4 -> v1.31.1 K8s, v1.18.0 -> head of this PR: https://github.com/edgelesssys/constellation/actions/runs/11431369883