Block requests based on provisioning state #759

mbarnes · 2024-10-23T12:36:09Z

What this PR does

This pull request builds upon #749 Implement operations status polling in backend pod, so its commits are duplicated here until that pull request is merged. (The last two commits are unique to this PR.)

Update: Instead of reproducing all the commits from #749 here I just cherry-picked the one this PR actually needs.

This should complete the ARM contract for asynchronous operations by blocking requests on a resource based on its provisioning state and – for child resources – its parent resource's provisioning state. The semantics for this are not covered in the Resource Provider Contract on GitHub (though they should be), so it has been derived from a number of other sources: RPaaS behavior, ARO Classic behavior, and the Resource Provider Contract Guidelines on eng.ms.

Note: Until the new /api/aro_hcp/v1 OCM endpoint becomes available, asynchronous operation statuses for node pools will not be accurate. But we can still fulfill the high-level ARM requirements.

Jira: I don't think we have a JIRA Story specifically for this, but it falls under XCMSTRAT-627 - ARO HCP - Async Operations
Link to demo recording:

Special notes for your reviewer

backend/main.go

backend/operations_scanner.go

backend/main.go

backend/operations_scanner.go

stevekuznetsov · 2024-10-23T13:20:15Z

backend/operations_scanner.go

+	const opStatus arm.ProvisioningState = arm.ProvisioningStateSucceeded
+	updated, err := s.dbClient.UpdateOperationDoc(ctx, doc.ID, func(updateDoc *database.OperationDocument) bool {
+		if opStatus != updateDoc.Status {
+			updateDoc.LastTransitionTime = time.Now()


You likely want to store now func() time.Time as a member on OperationsScanner to allow any of this to be unit-tested while injecting a fake clock.

(global comment for the whole PR)

SudoBrendan

I think that ultimately this looks good to merge as long as the other prereq stuff does too. Thanks for always organizing commits so neatly!

Adding some unit tests for the helpers would really make this a standout and save some future maintenance I'm sure.

SudoBrendan · 2024-10-23T17:23:10Z

frontend/pkg/frontend/node_pool.go

@@ -42,7 +42,7 @@ func (f *Frontend) CreateOrUpdateNodePool(writer http.ResponseWriter, request *h
 		return
 	}

-	nodePoolResourceID, err := ResourceIDFromContext(ctx)
+	resourceID, err := ResourceIDFromContext(ctx)


nit: feel free to dismiss - but I think I personally prefer the old names (nodePoolResourceID and nodePoolDoc) because there are so many resources at play here (clusters, CSnodePools, HCPNodePools, etc)

Yeah I take your point. For months I've been trying to inch toward unifying the cluster and node pool request handlers but so far I've only managed to do so for GET requests. I tend to make the logic (and variable names) as uniform as possible between the two but maybe that's at the expense of clarity.

Generally in this code base -- and this is just a vocabulary I've come up with -- "resource IDs" refer to Azure resources and "internal IDs" refer to Cluster Service resources. The InternalID type is just the OCM API path (/api/clusters_mgmt/v1/cluster/...) with some convenience methods around it.

frontend/pkg/frontend/helpers.go

This will be done by checking cluster's provisioning state instead of the ClusterState returned by Cluster Service. The check removal led to other simplifications in the logic, so there's some refactoring here too.

Returns a "409 Conflict" error response if the provisioning state of the resource or any parent resources (in the same namespace) should block a request from proceeding. Furthermore, if a DELETE request is accepted, mark any active operation on the resource as canceled. (Cluster Service will handle the actual cancellation of operations.)

SudoBrendan

thanks for the tests!

mbarnes requested review from bennerv, SudoBrendan, jharrington22, UlrichSchlueter and zgalor as code owners October 23, 2024 12:36