You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, legacy local disks in Compute Node work as follows:
There is a Local Disks Controller from which the node allocates/deallocates disks. The deallocation operation is always instant - it returns the disk to the controller, which then begins to clean it asynchronously. The allocation operation is blocking: the controller first tries to provide clean disks, and if only non-wiped disks are available, it blocks allocation until they are wiped.
Due to this approach, in the vast majority of cases, instances with local disks are created and deleted instantly, and only in a small number of cases do we get prolonged creation times while waiting for disk wiping.
Local disks over NBS work differently:
Creation is always instant because only clean devices are selected. Deletion is always slow because the disk deletion process (from the deletion operation perspective) includes devices cleaning. At this point, by switching to local disks over NBS, we're degrading the user experience. We would like the Disk Registry to have the same creation/deletion logic for Local disks over NBS as the current Local Disks Controller in Compute Node
The text was updated successfully, but these errors were encountered:
ya-ksgamora
changed the title
[NBS] Asynchronous local disks cleanup and allocation of dirty devices for local disks
[NBS] Asynchronous devices cleanup and synchronous allocation for local disks
Mar 3, 2025
ya-ksgamora
changed the title
[NBS] Asynchronous devices cleanup and synchronous allocation for local disks
[NBS] Asynchronous device cleanup (deallocation) and synchronous allocation for local disks
Mar 3, 2025
Now we also need to modify the Disk Manager code to prevent it from failing disk creation task during prolonged local disk creation attempts. Specifically, we need to ensure that it does not exceed the limit of retriable errors (currently, the limit is set at 100 errors) and to implement retries with a slower time limit (currently, the limit is set at 10 seconds).
Currently, legacy local disks in Compute Node work as follows:
There is a Local Disks Controller from which the node allocates/deallocates disks. The deallocation operation is always instant - it returns the disk to the controller, which then begins to clean it asynchronously. The allocation operation is blocking: the controller first tries to provide clean disks, and if only non-wiped disks are available, it blocks allocation until they are wiped.
Due to this approach, in the vast majority of cases, instances with local disks are created and deleted instantly, and only in a small number of cases do we get prolonged creation times while waiting for disk wiping.
Local disks over NBS work differently:
Creation is always instant because only clean devices are selected. Deletion is always slow because the disk deletion process (from the deletion operation perspective) includes devices cleaning. At this point, by switching to local disks over NBS, we're degrading the user experience. We would like the Disk Registry to have the same creation/deletion logic for Local disks over NBS as the current Local Disks Controller in Compute Node
The text was updated successfully, but these errors were encountered: