Retry obtaining KUBECONFIG file lock #3782

stmcginnis · 2024-11-07T19:39:42Z

When there are multiple concurrent create or delete operations, there can be different processing trying to lock the KUBECONFIG file at the same time. This leads to small windows where getting the lock files.

We don't fail the operation if something like "kind delete cluster" isn't able to remove its settings from KUBECONFIG. This is ideal and what most users would expect, but it also means there may be some leftover crud left behind that they are not aware of.

This adds some basic retries to obtaining the file lock. It potentially adds a very small delay (in a much longer process), but makes it more likely that the process is able to get the lock and complete the KUBECONFIG update.

Related: #3781

When there are multiple concurrent create or delete operations, there can be different processing trying to lock the KUBECONFIG file at the same time. This leads to small windows where getting the lock files. We don't fail the operation if something like "kind delete cluster" isn't able to remove its settings from KUBECONFIG. This is ideal and what most users would expect, but it also means there may be some leftover crud left behind that they are not aware of. This adds some basic retries to obtaining the file lock. It potentially adds a very small delay (in a much longer process), but makes it more likely that the process is able to get the lock and complete the KUBECONFIG update. Signed-off-by: Sean McGinnis <[email protected]>

k8s-ci-robot · 2024-11-07T19:39:48Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: stmcginnis
Once this PR has been reviewed and has the lgtm label, please assign bentheelder for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

BenTheElder · 2024-11-07T20:09:27Z

pkg/cluster/internal/kubeconfig/internal/kubeconfig/lock.go

-	if err != nil {
-		return err
+
+	// Retry obtaining the file lock a few times to accommodate concurrent operations.


I'm not sure this is safe, if there is a lock we're racing with another process anyhow? (which may logically conflict) and we're deviating in behavior from client-go

I think we probably need to retry the entire kubeconfig edit call instead?

I believe this should be the safe route. Once we obtain the lock, there shouldn't be any concerns about other client-go (or other libraries) that also use file locks to update the KUBECONFIG. All of the update operations happen between that window of obtaining the file lock and removing it.

When we update the contents, we do both read and write within this locked section.

Same with removing an entry. The file lock is obtained first, and only then are updates made.

The only difference from client-go is it will immediately fail if another process already has that lock. We will be a little more robust and just retry, where typical operations should be complete within milliseconds.

So I believe the only risk if there is another process that is not using the same file locking mechanism to control access to KUBECONFIG. As long is it operates under the same principle of getting the lock before reading and updating, then the only difference is we might be able to wait until the other process is done before completely failing the operation.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 7, 2024

k8s-ci-robot requested a review from aojea November 7, 2024 19:39

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 7, 2024

BenTheElder reviewed Nov 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry obtaining KUBECONFIG file lock #3782

Retry obtaining KUBECONFIG file lock #3782

stmcginnis commented Nov 7, 2024

k8s-ci-robot commented Nov 7, 2024

BenTheElder Nov 7, 2024

stmcginnis Nov 7, 2024

Retry obtaining KUBECONFIG file lock #3782

Are you sure you want to change the base?

Retry obtaining KUBECONFIG file lock #3782

Conversation

stmcginnis commented Nov 7, 2024

k8s-ci-robot commented Nov 7, 2024

BenTheElder Nov 7, 2024

Choose a reason for hiding this comment

stmcginnis Nov 7, 2024

Choose a reason for hiding this comment