You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
When creating a volume from a snapshot which is ~2TB in size, I am seeing timeouts from the CSI in which it "gives up" on the current task in vSphere and starts another to create the volume again.
This seems to happen after about 30-35 minutes from the PVC being created in a pending state. My task in vSphere does complete after 40 minutes but by then, a new task is created by the CSI and the loop starts over again until eventually almost all disk space is used in the datastore.
What you expected to happen:
The CSI should not create multiple tasks if the original task is still in progress. Or should have a configurable timeout.
How to reproduce it (as minimally and precisely as possible):
Create a PVC which will take >30 minutes to restore from a snapshot. In my case, 2TB
Create a PVC from that snapshot
Notice that while vSphere works on the task to create the container volume, the CnsVolumeOperationRequest will give up waiting and create a new task after about 30 minutes.
Anything else we need to know?:
Is there anyway to configure this timeout value? I'm not seeing a method in the code directly right now.
Environment:
csi-vsphere version: v3.1.2
vsphere-cloud-controller-manager version: 1.28.0
Kubernetes version: 1.28.10
vSphere version: 7.0.3.01700
OS (e.g. from /etc/os-release): Ubuntu 22.04
The text was updated successfully, but these errors were encountered:
Digging a bit in the logs I can see coming from the MonitorCreateVolumeTask function taskResult is empty for CreateVolume task: "task-xxxxxx", opID: "xxxxx"
This repeats a few times for the same task ID for 30 minutes and then is never output again as a new task is created in the CnsVolumeOperationRequest. As mentioned, the task does complete eventually but it takes a little bit longer than whatever timeout is happening here that makes the CSI give up waiting. This results in an infinite loop and orphaned FCDs in vSphere
/kind bug
What happened:
When creating a volume from a snapshot which is ~2TB in size, I am seeing timeouts from the CSI in which it "gives up" on the current task in vSphere and starts another to create the volume again.
This seems to happen after about 30-35 minutes from the PVC being created in a pending state. My task in vSphere does complete after 40 minutes but by then, a new task is created by the CSI and the loop starts over again until eventually almost all disk space is used in the datastore.
What you expected to happen:
The CSI should not create multiple tasks if the original task is still in progress. Or should have a configurable timeout.
How to reproduce it (as minimally and precisely as possible):
CnsVolumeOperationRequest
will give up waiting and create a new task after about 30 minutes.Anything else we need to know?:
Is there anyway to configure this timeout value? I'm not seeing a method in the code directly right now.
Environment:
The text was updated successfully, but these errors were encountered: