-
Notifications
You must be signed in to change notification settings - Fork 290
⚠️ Enable removal of ExternallyProvisioned attribute from BareMetalHosts #2566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -408,6 +408,9 @@ func (hsm *hostStateMachine) handleRegistering(_ *reconcileInfo) actionResult { | |
| // if the credentials change and the Host must be re-registered. | ||
| if hsm.Host.Spec.ExternallyProvisioned { | ||
| hsm.NextState = metal3api.StateExternallyProvisioned | ||
| } else if hsm.Host.Status.Provisioning.State == metal3api.StateExternallyProvisioned { | ||
| // Removing externallyManaged moves hosts to provisioned state | ||
| hsm.NextState = metal3api.StateProvisioned | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See Zane's comment: this is a wrong place. You need to fix |
||
| } else if inspectionDisabled(hsm.Host) { | ||
| hsm.NextState = metal3api.StatePreparing | ||
| } else { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -109,10 +109,6 @@ func (webhook *BareMetalHost) validateChanges(oldObj *metal3api.BareMetalHost, n | |
| errs = append(errs, errors.New("bootMACAddress can not be changed once it is set")) | ||
| } | ||
|
|
||
| if oldObj.Spec.ExternallyProvisioned != newObj.Spec.ExternallyProvisioned { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a valid bug fix.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The linked issue gives a bit more background, the webhook seems to be a bandaid fix to prevent the host from getting stuck in the provisioning state.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To echo what I said in the bug: if removing externallyProvisioned unconditionally deprovisions the host, we need to make sure that Or we need to permit them to be set and keep the host
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like I may have misunderstood the issue and this will need a bit of additional work. Just to confirm (since I was not at the working group meetings) the desired behavior is:
Overall I'm wondering if the approach originally for the externallyProvisioned would have been to follow the cert-manager pattern and had a
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Overall I'm wondering if the approach originally for the externallyProvisioned would have been to follow the cert-manager pattern and had a ProvisionRequest CR where some external system could fulfill the provision instead of hacking around the BareMetalHost CR. Just to reflect on this point, this would be a nice approach but in practice the system that has done the provisioning might not be cloud native or not even automated at all so a ProvisionRequest CR would provide some optional extra functionality, but it would still not cover all the use cases.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Well, we should have had a separate CRD for provisioning for sure. The HostClaim work metal3-io/metal3-docs#408 is a step in that direction but it does not address externally provisioned.
I think your summary is correct, except that: "If the |
||
| errs = append(errs, errors.New("externallyProvisioned can not be changed")) | ||
| } | ||
|
|
||
| return errs | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,7 @@ import ( | |
| "k8s.io/apimachinery/pkg/types" | ||
| "sigs.k8s.io/cluster-api/test/framework" | ||
| "sigs.k8s.io/cluster-api/util" | ||
| kclient "sigs.k8s.io/controller-runtime/pkg/client" | ||
| ) | ||
|
|
||
| var _ = Describe("Create as externally provisioned, deprovision", Label("required", "provision", "deprovision"), | ||
|
|
@@ -79,14 +80,28 @@ var _ = Describe("Create as externally provisioned, deprovision", Label("require | |
| }, &bmh) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
|
|
||
| By("checking that the BMH was not inspected or deployed") | ||
| By("Checking that the BMH was not inspected or deployed") | ||
| Expect(bmh.Status.OperationHistory.Inspect.Start.IsZero()).To(BeTrue()) | ||
| Expect(bmh.Status.OperationHistory.Provision.Start.IsZero()).To(BeTrue()) | ||
|
Comment on lines
84
to
85
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure these two conditions mean that the BMH is still in registering as they would never be true in case of an externally provisioned BMH at least that would be my expectation. I assume you have run this test locally but I wonder how long does the BMH stay in "Registering" state that your implementation relies on. How do you ensure that the BMH stays in registering state to allow the test code to do the bmh.Spec.ExternallyProvisioned field update? I am not a user of the externally provisioned feature so I might be missing something but I would have suspected the BMH to very quickly transition out of "Registering" and If that is true this test would be very prone to timing issues.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did a manual test however I had some issues getting the e2e tests running locally with I'll try again today as it seems that the tests did not align with what I saw manually testing. If you have any guidance that I might have missed getting the E2E working locally (on Linux / MacOS) that would be greatly appreciated. |
||
|
|
||
| By("Removing the ExternallyProvisioned field") | ||
| bmh.Spec.ExternallyProvisioned = false | ||
|
|
||
| err = clusterProxy.GetClient().Update(ctx, &bmh) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
|
|
||
| Eventually(func(g Gomega) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better use WaitForBmhInProvisioningState |
||
| err = clusterProxy.GetClient().Get(ctx, kclient.ObjectKeyFromObject(&bmh), &bmh) | ||
| g.Expect(bmh.Status.Provisioning.State).To(Equal(metal3api.StateProvisioned)) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The way the test is written, the expected state is not Provisioned. Or even: it make become Provisioned briefly, then BMO detects that there is no |
||
| }, 2*time.Second, 100*time.Millisecond).Should(Succeed()) | ||
|
|
||
| By("Deleting the BMH") | ||
| // Wait for 2 seconds to allow time to confirm annotation is set | ||
| // TODO: fix this so we do not need the sleep | ||
| time.Sleep(2 * time.Second) | ||
| // Wait for the finalizer to be there, otherwise cleanup logic can't hold the resource | ||
| Eventually(func(g Gomega) { | ||
| err = clusterProxy.GetClient().Get(ctx, kclient.ObjectKeyFromObject(&bmh), &bmh) | ||
| g.Expect(err).NotTo(HaveOccurred()) | ||
| g.Expect(bmh.Finalizers).To(ContainElement(metal3api.BareMetalHostFinalizer)) | ||
| }, 2*time.Second, 100*time.Millisecond).Should(Succeed()) | ||
|
|
||
| err = clusterProxy.GetClient().Delete(ctx, &bmh) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this implementation move all externally provisioned BMH to "regular provisioned" state even if the user would add an unrelated annotation or the reconciliation would be triggered for whatever other reason?
It might be just a theoretical scenario but I would like to have an an additionl condition that checks that the "ExtarnallyProvisioned" filed is really set to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this lands in an else block after the check for externally provisioned I don't believe that would be possible here.
Happy to reformat it as well into a switch statement or similar as the conditional branches here are getting a bit wild already before the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this does anything at all, because in
handleRegistering()thehsm.Host.Status.Provisioning.Stateis, by definition,StateRegistering.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true , silly me :D .
But then there is the other thing zaneb pointed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original bug is about the incorrect state in Ironic: once we get to Inspecting, BMO does not expect to find the Node in the
activestate in Ironic, but that's what actually happens. BMO needs to deprovision the host first. Then make sure that inspection still happen.