Skip to content

[NBS] Asynchronous device cleanup (deallocation) and synchronous allocation for local disks #2945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ya-ksgamora opened this issue Jan 29, 2025 · 1 comment · Fixed by #3037 · May be fixed by #3285
Open

[NBS] Asynchronous device cleanup (deallocation) and synchronous allocation for local disks #2945

ya-ksgamora opened this issue Jan 29, 2025 · 1 comment · Fixed by #3037 · May be fixed by #3285
Assignees
Labels
blockstore Add this label to run only cloud/blockstore build and tests on PR

Comments

@ya-ksgamora
Copy link
Collaborator

Currently, legacy local disks in Compute Node work as follows:

There is a Local Disks Controller from which the node allocates/deallocates disks. The deallocation operation is always instant - it returns the disk to the controller, which then begins to clean it asynchronously. The allocation operation is blocking: the controller first tries to provide clean disks, and if only non-wiped disks are available, it blocks allocation until they are wiped.

Due to this approach, in the vast majority of cases, instances with local disks are created and deleted instantly, and only in a small number of cases do we get prolonged creation times while waiting for disk wiping.

Local disks over NBS work differently:

Creation is always instant because only clean devices are selected. Deletion is always slow because the disk deletion process (from the deletion operation perspective) includes devices cleaning. At this point, by switching to local disks over NBS, we're degrading the user experience. We would like the Disk Registry to have the same creation/deletion logic for Local disks over NBS as the current Local Disks Controller in Compute Node

@ya-ksgamora ya-ksgamora added the blockstore Add this label to run only cloud/blockstore build and tests on PR label Jan 29, 2025
@ya-ksgamora ya-ksgamora self-assigned this Jan 29, 2025
@ya-ksgamora ya-ksgamora changed the title [NBS] Asynchronous local disks cleanup and allocation of dirty devices for local disks [NBS] Asynchronous devices cleanup and synchronous allocation for local disks Mar 3, 2025
@ya-ksgamora ya-ksgamora changed the title [NBS] Asynchronous devices cleanup and synchronous allocation for local disks [NBS] Asynchronous device cleanup (deallocation) and synchronous allocation for local disks Mar 3, 2025
@ya-ksgamora ya-ksgamora reopened this Mar 20, 2025
@ya-ksgamora
Copy link
Collaborator Author

ya-ksgamora commented Mar 20, 2025

Now we also need to modify the Disk Manager code to prevent it from failing disk creation task during prolonged local disk creation attempts. Specifically, we need to ensure that it does not exceed the limit of retriable errors (currently, the limit is set at 100 errors) and to implement retries with a slower time limit (currently, the limit is set at 10 seconds).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blockstore Add this label to run only cloud/blockstore build and tests on PR
Projects
None yet
1 participant