Skip to content

Issue 68: Graceful shutdown with outstanding IO #109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 10, 2025

Conversation

yamingk
Copy link
Collaborator

@yamingk yamingk commented Jun 6, 2025

To Reviewers:

Please hide whitespaces, there is a bunch of clang-format changes that is non-logic change.

Changes:

  1. During graceful shutdown, if there are outstanding requests:
    1. Pending create/remove volumes on going,
    2. Any volume has outstanding IOs,
      Graceful shutdown will wait for all above outstanding requets to complete before proceed.
  2. After graceful shutdown is recieved, all create/remove volume and any IO on any volume will be rejected.

Testing

  1. Add a new test case to trigger a delayed read and write, then do graceful shutdown and it should wait for all IOs to finish before proceeding.
  2. Supporting logs show below, shutdown is waiting for outstanding IO to finish (one read and one write) with delay_flip enabled.
[06/06/25 16:43:52-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:130:do_shutdown] Shutdown timer triggered, checking for outstanding requests
[06/06/25 16:43:52-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:108:no_outstanding_vols] Found outstanding volume e207d124-f1be-4a75-b053-ca8447374f95 that has outstanding requests: 2
[06/06/25 16:43:52-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:125:can_shutdown] Shutdown cannot proceed, outstanding requests: 0
[06/06/25 16:43:52-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:136:do_shutdown] Outstanding requests exist, will retry shutdown in 2 seconds
[06/06/25 16:43:54-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:130:do_shutdown] Shutdown timer triggered, checking for outstanding requests
[06/06/25 16:43:54-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:108:no_outstanding_vols] Found outstanding volume e207d124-f1be-4a75-b053-ca8447374f95 that has outstanding requests: 2
[06/06/25 16:43:54-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:125:can_shutdown] Shutdown cannot proceed, outstanding requests: 0
[06/06/25 16:43:54-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:136:do_shutdown] Outstanding requests exist, will retry shutdown in 2 seconds
[06/06/25 16:43:56-07:00] [I] [test_volume] [227698] [volume_mgr.cpp:285:operator()] Resuming fake IO delay flip is done. Do nothing
[06/06/25 16:43:56-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:130:do_shutdown] Shutdown timer triggered, checking for outstanding requests
[06/06/25 16:43:56-07:00] [I] [test_volume] [227698] [volume_mgr.cpp:285:operator()] Resuming fake IO delay flip is done. Do nothing
[06/06/25 16:43:56-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:108:no_outstanding_vols] Found outstanding volume e207d124-f1be-4a75-b053-ca8447374f95 that has outstanding requests: 1
[06/06/25 16:43:56-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:125:can_shutdown] Shutdown cannot proceed, outstanding requests: 0
[06/06/25 16:43:56-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:136:do_shutdown] Outstanding requests exist, will retry shutdown in 2 seconds
[06/06/25 16:43:58-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:130:do_shutdown] Shutdown timer triggered, checking for outstanding requests
[06/06/25 16:43:58-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:114:no_outstanding_vols] No volumes with outstanding requests
[06/06/25 16:43:58-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:121:can_shutdown] Shutdown can proceed, outstanding requests: 0
[06/06/25 16:43:58-07:00] [I] [test_volume] [227741] [homeblks_impl.cpp:132:do_shutdown] No outstanding requests, proceeding with shutdown

@codecov-commenter
Copy link

codecov-commenter commented Jun 6, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 76.64234% with 32 lines in your changes missing coverage. Please review.

Project coverage is 80.34%. Comparing base (0b40663) to head (918a92d).
Report is 29 commits behind head on main.

Files with missing lines Patch % Lines
src/lib/volume/volume.cpp 77.96% 9 Missing and 4 partials ⚠️
src/lib/homeblks_impl.cpp 73.17% 5 Missing and 6 partials ⚠️
src/lib/volume_mgr.cpp 46.15% 6 Missing and 1 partial ⚠️
src/lib/volume/volume.hpp 75.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #109       +/-   ##
===========================================
+ Coverage   61.68%   80.34%   +18.65%     
===========================================
  Files          15       17        +2     
  Lines         462     1175      +713     
  Branches       35      128       +93     
===========================================
+ Hits          285      944      +659     
- Misses        158      166        +8     
- Partials       19       65       +46     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yamingk yamingk requested review from raakella1 and sanebay June 6, 2025 23:56
if (vol->is_destroying() || is_shutting_down()) {
LOGE(
"Can't serve write, Volume {} is_destroying: {} is either in destroying state or System is shutting down. ",
vol->id_str(), vol->is_destroying());
return folly::makeUnexpected(VolumeError::UNSUPPORTED_OP);
}

vol->inc_ref();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt it better to keep the inc_ref also inside the volume ?

Copy link
Collaborator Author

@yamingk yamingk Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it here on purpose for the flip to kick in. We can also do that inside volume, however I tend to keep the write path clean if we can keep one less flip outside of write path.

I want to make the ref_cnt cleaner anyway, so might restructure the code a little in following PR. Hope it is fine with you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yamingk yamingk merged commit cf73ca7 into eBay:main Jun 10, 2025
34 of 37 checks passed
@yamingk yamingk deleted the yk_shutdown2 branch June 10, 2025 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants