-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not hold mutex when write keys if not necessary #7516
Conversation
9040d54
to
c66ed81
Compare
c66ed81
to
ed90e2a
Compare
Signed-off-by: Little-Wallace <[email protected]>
131150f
to
c9589a8
Compare
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
6122e6f
to
5fe832a
Compare
@yiwu-arbug PTAL again |
I have fixed failed CI caused by data race and deadlock . |
Signed-off-by: Little-Wallace <[email protected]>
Thanks for the PR. I plan to take a look this week. |
Thanks @Little-Wallace for the PR! Overall I think this is an improvement. |
Summary: This variable is actually not being used for anything meaningful, thus remove it. This can make #7516 slightly simpler by reducing the amount of state that must be made lock-free. Pull Request resolved: #10078 Test Plan: make check Reviewed By: ajkr Differential Revision: D36779817 Pulled By: riversand963 fbshipit-source-id: ffb0d9ad6149616917ae5e02bb28102cb90fc406
Thanks for the suggestion. Will try later. |
@Little-Wallace has updated the pull request. You must reimport the pull request before landing. |
@Little-Wallace has updated the pull request. You must reimport the pull request before landing. |
@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Good thing is that I haven't been able to observe a perf regression so far. |
@riversand963 One suggestion for the benchmark is to use a smaller Edit: Oops, apparently little wallace already made this point🤣 ignore me. |
While showing end-to-end performance gain requires more efforts, it's easy to show that time spent holding the db mutex has drastically decreased. One of the updated unit tests in this PR, i.e. ASSERT_GT(total_db_mutex_nanos, 2000U); After this PR, ASSERT_LT(total_db_mutex_nanos, 100U); I did another simple benchmarking on a non-vm host. TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillseq,overwrite -duration=60 -batch_size=100 -perf_level=5 Results show
|
@Little-Wallace has updated the pull request. You must reimport the pull request before landing. |
@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: This variable is actually not being used for anything meaningful, thus remove it. This can make facebook#7516 slightly simpler by reducing the amount of state that must be made lock-free. Pull Request resolved: facebook#10078 Test Plan: make check Reviewed By: ajkr Differential Revision: D36779817 Pulled By: riversand963 fbshipit-source-id: ffb0d9ad6149616917ae5e02bb28102cb90fc406 Signed-off-by: tabokie <[email protected]>
…#10187) Summary: Resolves facebook#10129 I extracted this fix from facebook#7516 since it's also already a bug in main branch, and we want to separate it from the main part of the PR. There can be a race condition between two threads. Thread 1 executes `DBImpl::FindObsoleteFiles()` while thread 2 executes `GetSortedWals()`. ``` Time thread 1 thread 2 | mutex_.lock | read disable_delete_obsolete_files_ | ... | wait on log_sync_cv and release mutex_ | mutex_.lock | ++disable_delete_obsolete_files_ | mutex_.unlock | mutex_.lock | while (pending_purge_obsolete_files > 0) { bg_cv.wait;} | wake up with mutex_ locked | compute WALs tracked by MANIFEST | mutex_.unlock | wake up with mutex_ locked | ++pending_purge_obsolete_files_ | mutex_.unlock | | delete obsolete WAL | WAL missing but tracked in MANIFEST. V ``` The fix proposed eliminates the possibility of the above by increasing `pending_purge_obsolete_files_` before `FindObsoleteFiles()` can possibly release the mutex. Pull Request resolved: facebook#10187 Test Plan: make check Reviewed By: ltamasi Differential Revision: D37214235 Pulled By: riversand963 fbshipit-source-id: 556ab1b58ae6d19150169dfac4db08195c797184 Signed-off-by: tabokie <[email protected]>
Summary: RocksDB will acquire the global mutex of db instance for every time when user calls `Write`. When RocksDB schedules a lot of compaction jobs, it will compete the mutex with write thread and it will hurt the write performance. I want to use log_write_mutex to replace the global mutex in most case so that we do not acquire it in write-thread unless there is a write-stall event or a write-buffer-full event occur. Pull Request resolved: facebook#7516 Test Plan: 1. make check 2. CI 3. COMPILE_WITH_TSAN=1 make db_stress make crash_test make crash_test_with_multiops_wp_txn make crash_test_with_multiops_wc_txn make crash_test_with_atomic_flush Reviewed By: siying Differential Revision: D36908702 Pulled By: riversand963 fbshipit-source-id: 59b13881f4f5c0a58fd3ca79128a396d9cd98efe Signed-off-by: tabokie <[email protected]>
Summary: This variable is actually not being used for anything meaningful, thus remove it. This can make facebook#7516 slightly simpler by reducing the amount of state that must be made lock-free. Pull Request resolved: facebook#10078 Test Plan: make check Reviewed By: ajkr Differential Revision: D36779817 Pulled By: riversand963 fbshipit-source-id: ffb0d9ad6149616917ae5e02bb28102cb90fc406 Signed-off-by: tabokie <[email protected]>
…#10187) Summary: Resolves facebook#10129 I extracted this fix from facebook#7516 since it's also already a bug in main branch, and we want to separate it from the main part of the PR. There can be a race condition between two threads. Thread 1 executes `DBImpl::FindObsoleteFiles()` while thread 2 executes `GetSortedWals()`. ``` Time thread 1 thread 2 | mutex_.lock | read disable_delete_obsolete_files_ | ... | wait on log_sync_cv and release mutex_ | mutex_.lock | ++disable_delete_obsolete_files_ | mutex_.unlock | mutex_.lock | while (pending_purge_obsolete_files > 0) { bg_cv.wait;} | wake up with mutex_ locked | compute WALs tracked by MANIFEST | mutex_.unlock | wake up with mutex_ locked | ++pending_purge_obsolete_files_ | mutex_.unlock | | delete obsolete WAL | WAL missing but tracked in MANIFEST. V ``` The fix proposed eliminates the possibility of the above by increasing `pending_purge_obsolete_files_` before `FindObsoleteFiles()` can possibly release the mutex. Pull Request resolved: facebook#10187 Test Plan: make check Reviewed By: ltamasi Differential Revision: D37214235 Pulled By: riversand963 fbshipit-source-id: 556ab1b58ae6d19150169dfac4db08195c797184 Signed-off-by: tabokie <[email protected]>
Summary: RocksDB will acquire the global mutex of db instance for every time when user calls `Write`. When RocksDB schedules a lot of compaction jobs, it will compete the mutex with write thread and it will hurt the write performance. I want to use log_write_mutex to replace the global mutex in most case so that we do not acquire it in write-thread unless there is a write-stall event or a write-buffer-full event occur. Pull Request resolved: facebook#7516 Test Plan: 1. make check 2. CI 3. COMPILE_WITH_TSAN=1 make db_stress make crash_test make crash_test_with_multiops_wp_txn make crash_test_with_multiops_wc_txn make crash_test_with_atomic_flush Reviewed By: siying Differential Revision: D36908702 Pulled By: riversand963 fbshipit-source-id: 59b13881f4f5c0a58fd3ca79128a396d9cd98efe Signed-off-by: tabokie <[email protected]>
I see large improvements with fillseq (leveled, universal) and overwrite (universal) in IO-bound workloads -- up to 1.5X more throughput. Thank you for making RocksDB better. https://twitter.com/MarkCallaghanDB/status/1574425353564475394 |
Problem Summary
RocksDB will acquire the global mutex of db instance for every time when user calls
Write
. When RocksDB schedules a lot of compaction jobs, it will compete the mutex with write thread and it will hurt the write performance.Problem Solution:
I want to use log_write_mutex to replace the global mutex in most case so that we do not acquire it in write-thread unless there is a write-stall event or a write-buffer-full event occur.
Test plan
make crash_test
make crash_test_with_multiops_wp_txn
make crash_test_with_multiops_wc_txn
make crash_test_with_atomic_flush