Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block requests based on provisioning state #759

Merged
merged 3 commits into from
Oct 29, 2024
Merged

Conversation

mbarnes
Copy link
Collaborator

@mbarnes mbarnes commented Oct 23, 2024

What this PR does

This pull request builds upon #749 Implement operations status polling in backend pod, so its commits are duplicated here until that pull request is merged. (The last two commits are unique to this PR.)

Update: Instead of reproducing all the commits from #749 here I just cherry-picked the one this PR actually needs.

This should complete the ARM contract for asynchronous operations by blocking requests on a resource based on its provisioning state and – for child resources – its parent resource's provisioning state. The semantics for this are not covered in the Resource Provider Contract on GitHub (though they should be), so it has been derived from a number of other sources: RPaaS behavior, ARO Classic behavior, and the Resource Provider Contract Guidelines on eng.ms.

Note: Until the new /api/aro_hcp/v1 OCM endpoint becomes available, asynchronous operation statuses for node pools will not be accurate. But we can still fulfill the high-level ARM requirements.

Jira: I don't think we have a JIRA Story specifically for this, but it falls under XCMSTRAT-627 - ARO HCP - Async Operations
Link to demo recording:

Special notes for your reviewer

backend/main.go Outdated Show resolved Hide resolved
backend/main.go Outdated Show resolved Hide resolved
backend/main.go Outdated Show resolved Hide resolved
const opStatus arm.ProvisioningState = arm.ProvisioningStateSucceeded
updated, err := s.dbClient.UpdateOperationDoc(ctx, doc.ID, func(updateDoc *database.OperationDocument) bool {
if opStatus != updateDoc.Status {
updateDoc.LastTransitionTime = time.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You likely want to store now func() time.Time as a member on OperationsScanner to allow any of this to be unit-tested while injecting a fake clock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(global comment for the whole PR)

SudoBrendan
SudoBrendan previously approved these changes Oct 23, 2024
Copy link
Collaborator

@SudoBrendan SudoBrendan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that ultimately this looks good to merge as long as the other prereq stuff does too. Thanks for always organizing commits so neatly!

Adding some unit tests for the helpers would really make this a standout and save some future maintenance I'm sure.

@@ -42,7 +42,7 @@ func (f *Frontend) CreateOrUpdateNodePool(writer http.ResponseWriter, request *h
return
}

nodePoolResourceID, err := ResourceIDFromContext(ctx)
resourceID, err := ResourceIDFromContext(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: feel free to dismiss - but I think I personally prefer the old names (nodePoolResourceID and nodePoolDoc) because there are so many resources at play here (clusters, CSnodePools, HCPNodePools, etc)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I take your point. For months I've been trying to inch toward unifying the cluster and node pool request handlers but so far I've only managed to do so for GET requests. I tend to make the logic (and variable names) as uniform as possible between the two but maybe that's at the expense of clarity.

Generally in this code base -- and this is just a vocabulary I've come up with -- "resource IDs" refer to Azure resources and "internal IDs" refer to Cluster Service resources. The InternalID type is just the OCM API path (/api/clusters_mgmt/v1/cluster/...) with some convenience methods around it.

frontend/pkg/frontend/helpers.go Show resolved Hide resolved
frontend/pkg/frontend/helpers.go Outdated Show resolved Hide resolved
Matthew Barnes added 2 commits October 28, 2024 08:47
This will be done by checking cluster's provisioning state instead
of the ClusterState returned by Cluster Service.

The check removal led to other simplifications in the logic, so
there's some refactoring here too.
Returns a "409 Conflict" error response if the provisioning state
of the resource or any parent resources (in the same namespace)
should block a request from proceeding.

Furthermore, if a DELETE request is accepted, mark any active
operation on the resource as canceled. (Cluster Service will
handle the actual cancellation of operations.)
Copy link
Collaborator

@SudoBrendan SudoBrendan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the tests!

@SudoBrendan SudoBrendan merged commit a004e97 into main Oct 29, 2024
25 checks passed
@SudoBrendan SudoBrendan deleted the check-for-conflict branch October 29, 2024 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants