Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateVolume times out before task can complete, starts another in an infinite loop #3024

Open
braunsonm opened this issue Sep 3, 2024 · 1 comment
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@braunsonm
Copy link

braunsonm commented Sep 3, 2024

/kind bug

What happened:
When creating a volume from a snapshot which is ~2TB in size, I am seeing timeouts from the CSI in which it "gives up" on the current task in vSphere and starts another to create the volume again.

This seems to happen after about 30-35 minutes from the PVC being created in a pending state. My task in vSphere does complete after 40 minutes but by then, a new task is created by the CSI and the loop starts over again until eventually almost all disk space is used in the datastore.

What you expected to happen:

The CSI should not create multiple tasks if the original task is still in progress. Or should have a configurable timeout.

How to reproduce it (as minimally and precisely as possible):

  1. Create a PVC which will take >30 minutes to restore from a snapshot. In my case, 2TB
  2. Create a PVC from that snapshot
  3. Notice that while vSphere works on the task to create the container volume, the CnsVolumeOperationRequest will give up waiting and create a new task after about 30 minutes.

Anything else we need to know?:

Is there anyway to configure this timeout value? I'm not seeing a method in the code directly right now.

Environment:

  • csi-vsphere version: v3.1.2
  • vsphere-cloud-controller-manager version: 1.28.0
  • Kubernetes version: 1.28.10
  • vSphere version: 7.0.3.01700
  • OS (e.g. from /etc/os-release): Ubuntu 22.04
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 3, 2024
@braunsonm
Copy link
Author

braunsonm commented Sep 3, 2024

Digging a bit in the logs I can see coming from the MonitorCreateVolumeTask function taskResult is empty for CreateVolume task: "task-xxxxxx", opID: "xxxxx"

This repeats a few times for the same task ID for 30 minutes and then is never output again as a new task is created in the CnsVolumeOperationRequest. As mentioned, the task does complete eventually but it takes a little bit longer than whatever timeout is happening here that makes the CSI give up waiting. This results in an infinite loop and orphaned FCDs in vSphere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants