-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update concepts.md #4
Open
hrajput89
wants to merge
1
commit into
couchbase:master
Choose a base branch
from
hrajput89:patch-1
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
typo in backfill spelling
trondn
pushed a commit
to trondn/kv_engine
that referenced
this pull request
Jun 18, 2020
TSan found lock inversion as DcpProducer::setDisconnect holds `StreamContainer->rlock()` and then acquires `vb->getStateLock()` whereas `VBucket::set()` acquires them in the opposite order. Release the stream container lock before calling `Stream::setDead()` to avoid holding both in the `setDisconnect` path. TSan report: [ RUN ] DurabilityTest.MB34780 ================== WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=10424) Cycle in lock order graph: M4054 (0x7b68000308f8) => M201671426334441480 (0x000000000000) => M4054 Mutex M201671426334441480 acquired here while holding mutex M4054 in thread T7: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002953b) couchbase#1 cb_rw_reader_enter(pthread_rwlock_t*) .../platform/src/cb_pthreads.cc:195 (libplatform_so.so.0.1.0+0x000000009cfa) couchbase#2 cb::RWLock::readerLock() .../platform/include/platform/rwlock.h:87 (ep.so+0x0000000f4326) couchbase#3 cb::RWLock::lock_shared() .../platform/include/platform/rwlock.h:67 (ep.so+0x00000012f2ba) couchbase#4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) /usr/local/include/c++/7.3.0/shared_mutex:553 (ep.so+0x00000012f2ba) couchbase#5 StreamContainer<std::shared_ptr<Stream> >::ReadLockedHandle::ReadLockedHandle(...) .../kv_engine/engines/ep/src/dcp/stream_container.h:213 (ep.so+0x00000012f2ba) couchbase#6 StreamContainer<std::shared_ptr<Stream> >::rlock() const .../kv_engine/engines/ep/src/dcp/stream_container.h:273 (ep.so+0x000000122ad5) #7 DcpProducer::notifySeqnoAvailable(Vbid, unsigned long) .../kv_engine/engines/ep/src/dcp/producer.cc:1312 (ep.so+0x000000122ad5) #8 DcpConnMap::notifyVBConnections(Vbid, unsigned long) .../kv_engine/engines/ep/src/dcp/dcpconnmap.cc:424 (ep.so+0x0000000fa0af) #9 KVBucket::notifyReplication(Vbid, long) .../kv_engine/engines/ep/src/kv_bucket.cc:2570 (ep.so+0x00000020ed03) #10 EPBucket::notifyNewSeqno(Vbid, VBNotifyCtx const&) .../kv_engine/engines/ep/src/ep_bucket.cc:1327 (ep.so+0x000000160a95) #11 NotifyNewSeqnoCB::callback(Vbid const&, VBNotifyCtx const&) .../kv_engine/engines/ep/src/kv_bucket.h:837 (ep.so+0x000000224dcb) #12 VBucket::notifyNewSeqno(VBNotifyCtx const&) .../kv_engine/engines/ep/src/vbucket.cc:3579 (ep.so+0x000000262f6b) #13 VBucket::set(...) .../kv_engine/engines/ep/src/vbucket.cc:1569 (ep.so+0x00000026af3f) #14 KVBucket::set(...) .../kv_engine/engines/ep/src/kv_bucket.cc:692 (ep.so+0x00000021ee48) #15 EventuallyPersistentEngine::storeIfInner(...) .../kv_engine/engines/ep/src/ep_engine.cc:2440 (ep.so+0x00000018071f) Mutex M4054 previously acquired by the same thread here: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005b63d) couchbase#1 folly::detail::annotate_rwlock_acquired_impl(...) .../follytsan/folly/synchronization/SanitizeThread.cpp:91 (memcached+0x0000006463de) couchbase#2 annotate_rwlock_acquired .../build/tlm/deps/folly.exploded/include/folly/synchronization/SanitizeThread.h:99 (ep.so+0x00000021e932) couchbase#3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:696 (ep.so+0x00000021e932) couchbase#4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared(folly::SharedMutexToken&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:376 (ep.so+0x00000021e932) couchbase#5 folly::SharedMutexImpl<false, void, std::atomic, false, true>::ReadHolder::ReadHolder(folly::SharedMutexImpl<false, void, std::atomic, false, true> const&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:1315 (ep.so+0x00000021e932) couchbase#6 KVBucket::set(...) .../kv_engine/engines/ep/src/kv_bucket.cc:659 (ep.so+0x00000021e932) #7 EventuallyPersistentEngine::storeIfInner(...)> const&) .../kv_engine/engines/ep/src/ep_engine.cc:2440 (ep.so+0x00000018071f) Mutex M4054 acquired here while holding mutex M201671426334441480 in thread T5: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005b63d) couchbase#1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) .../follytsan/folly/synchronization/SanitizeThread.cpp:91 (memcached+0x0000006463de) couchbase#2 annotate_rwlock_acquired .../build/tlm/deps/folly.exploded/include/folly/synchronization/SanitizeThread.h:99 (ep.so+0x0000000bb5b6) couchbase#3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:696 (ep.so+0x0000000bb5b6) couchbase#4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared(folly::SharedMutexToken&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:376 (ep.so+0x0000000bb5b6) couchbase#5 folly::SharedMutexImpl<false, void, std::atomic, false, true>::ReadHolder::ReadHolder(...) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:1315 (ep.so+0x0000000bb5b6) couchbase#6 ActiveStream::setDead(end_stream_status_t) .../kv_engine/engines/ep/src/dcp/active_stream.cc:1181 (ep.so+0x0000000bb5b6) #7 operator() .../kv_engine/engines/ep/src/dcp/producer.cc:1491 (ep.so+0x0000001207df) #8 for_each<..., DcpProducer::setDisconnect()::<lambda(folly::AtomicHashMap<...>::value_type&)> > /usr/local/include/c++/7.3.0/bits/stl_algo.h:3884 (ep.so+0x0000001207df) #9 DcpProducer::setDisconnect() .../kv_engine/engines/ep/src/dcp/producer.cc:1493 (ep.so+0x000000120b08) Mutex M201671426334441480 previously acquired by the same thread here: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002953b) couchbase#1 cb_rw_reader_enter(pthread_rwlock_t*) .../platform/src/cb_pthreads.cc:195 (libplatform_so.so.0.1.0+0x000000009cfa) couchbase#2 cb::RWLock::readerLock() .../platform/include/platform/rwlock.h:87 (ep.so+0x0000001205e9) couchbase#3 cb::RWLock::lock_shared() .../platform/include/platform/rwlock.h:67 (ep.so+0x0000001205e9) couchbase#4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) /usr/local/include/c++/7.3.0/shared_mutex:553 (ep.so+0x0000001205e9) couchbase#5 StreamContainer<std::shared_ptr<Stream> >::ReadLockedHandle::ReadLockedHandle(StreamContainer<std::shared_ptr<Stream> > const&) .../kv_engine/engines/ep/src/dcp/stream_container.h:213 (ep.so+0x0000001205e9) couchbase#6 StreamContainer<std::shared_ptr<Stream> >::rlock() const .../kv_engine/engines/ep/src/dcp/stream_container.h:273 (ep.so+0x0000001205e9) #7 operator() .../kv_engine/engines/ep/src/dcp/producer.cc:1490 (ep.so+0x0000001205e9) #8 for_each<..., DcpProducer::setDisconnect()::<lambda(folly::AtomicHashMap<...>::value_type&)> > /usr/local/include/c++/7.3.0/bits/stl_algo.h:3884 (ep.so+0x0000001207df) #9 DcpProducer::setDisconnect() .../kv_engine/engines/ep/src/dcp/producer.cc:1493 (ep.so+0x000000120b08) Change-Id: Ibb2ae11498c7226514bc18788778878bd6c87363 Reviewed-on: http://review.couchbase.org/111905 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
trondn
pushed a commit
to trondn/kv_engine
that referenced
this pull request
Jun 18, 2020
TSan found lock inversion as DcpProducer::closeAllStreams() holds `StreamContainer->wlock()` and then acquires `vb->getStateLock()` whereas `VBucket::set()` acquires them in the opposite order. Release the stream container lock before calling `Stream::setDead()` to avoid holding both in the `closeAllStreams()` path. Also, preemptively apply the same change to `setStreamDeadStatus` though TSan has not identified inversion in this case. TSan report: [ RUN ] DurabilityTest.MB34780 ================== WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=16422) Cycle in lock order graph: M3987 (0x7b68000308f8) => M225878274331574312 (0x000000000000) => M3987 Mutex M225878274331574312 acquired here while holding mutex M3987 in thread T7: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002953b) couchbase#1 cb_rw_reader_enter(pthread_rwlock_t*) .../platform/src/cb_pthreads.cc:195 (libplatform_so.so.0.1.0+0x000000009cfa) couchbase#2 cb::RWLock::readerLock() .../platform/include/platform/rwlock.h:87 (ep.so+0x0000000f423a) couchbase#3 cb::RWLock::lock_shared() .../platform/include/platform/rwlock.h:67 (ep.so+0x00000012f578) couchbase#4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) /usr/local/include/c++/7.3.0/shared_mutex:553 (ep.so+0x00000012f578) couchbase#5 StreamContainer<std::shared_ptr<Stream> >::ReadLockedHandle::ReadLockedHandle(StreamContainer<std::shared_ptr<Stream> > const&) .../kv_engine/engines/ep/src/dcp/stream_container.h:213 (ep.so+0x00000012f578) couchbase#6 StreamContainer<std::shared_ptr<Stream> >::rlock() const .../kv_engine/engines/ep/src/dcp/stream_container.h:273 (ep.so+0x000000122ea7) #7 DcpProducer::notifySeqnoAvailable(Vbid, unsigned long) .../kv_engine/engines/ep/src/dcp/producer.cc:1312 (ep.so+0x000000122ea7) #8 DcpConnMap::notifyVBConnections(Vbid, unsigned long) .../kv_engine/engines/ep/src/dcp/dcpconnmap.cc:424 (ep.so+0x0000000fa071) #9 KVBucket::notifyReplication(Vbid, long) .../kv_engine/engines/ep/src/kv_bucket.cc:2570 (ep.so+0x000000210711) #10 EPBucket::notifyNewSeqno(Vbid, VBNotifyCtx const&) .../kv_engine/engines/ep/src/ep_bucket.cc:1327 (ep.so+0x00000016232b) #11 NotifyNewSeqnoCB::callback(Vbid const&, VBNotifyCtx const&) .../kv_engine/engines/ep/src/kv_bucket.h:837 (ep.so+0x0000002267d9) #12 VBucket::notifyNewSeqno(VBNotifyCtx const&) .../kv_engine/engines/ep/src/vbucket.cc:3631 (ep.so+0x000000264871) #13 VBucket::set() .../kv_engine/engines/ep/src/vbucket.cc:1568 (ep.so+0x00000026c768) #14 KVBucket::set() .../kv_engine/engines/ep/src/kv_bucket.cc:692 (ep.so+0x000000220856) #15 EventuallyPersistentEngine::storeIfInner() .../kv_engine/engines/ep/src/ep_engine.cc:2440 (ep.so+0x000000181fef) Mutex M3987 previously acquired by the same thread here: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005b63d) couchbase#1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) .../folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.cpp:91 (memcached+0x0000006463de) couchbase#2 annotate_rwlock_acquired .../build/tlm/deps/folly.exploded/include/folly/synchronization/SanitizeThread.h:99 (ep.so+0x000000220340) couchbase#3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:696 (ep.so+0x000000220340) couchbase#4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared(folly::SharedMutexToken&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:376 (ep.so+0x000000220340) couchbase#5 folly::SharedMutexImpl<false, void, std::atomic, false, true>::ReadHolder::ReadHolder(folly::SharedMutexImpl<false, void, std::atomic, false, true> const&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:1315 (ep.so+0x000000220340) couchbase#6 KVBucket::set() .../kv_engine/engines/ep/src/kv_bucket.cc:659 (ep.so+0x000000220340) #7 EventuallyPersistentEngine::storeIfInner() .../kv_engine/engines/ep/src/ep_engine.cc:2440 (ep.so+0x000000181fef) Mutex M3987 acquired here while holding mutex M225878274331574312 in thread T5: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005b63d) couchbase#1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) .../folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.cpp:91 (memcached+0x0000006463de) couchbase#2 annotate_rwlock_acquired .../build/tlm/deps/folly.exploded/include/folly/synchronization/SanitizeThread.h:99 (ep.so+0x0000000bb626) couchbase#3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:696 (ep.so+0x0000000bb626) couchbase#4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared(folly::SharedMutexToken&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:376 (ep.so+0x0000000bb626) couchbase#5 folly::SharedMutexImpl<false, void, std::atomic, false, true>::ReadHolder::ReadHolder(folly::SharedMutexImpl<false, void, std::atomic, false, true> const&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:1315 (ep.so+0x0000000bb626) couchbase#6 ActiveStream::setDead(end_stream_status_t) .../kv_engine/engines/ep/src/dcp/active_stream.cc:1181 (ep.so+0x0000000bb626) #7 operator() .../kv_engine/engines/ep/src/dcp/producer.cc:1383 (ep.so+0x0000001257d1) #8 for_each<...> /usr/local/include/c++/7.3.0/bits/stl_algo.h:3884 (ep.so+0x0000001257d1) #9 DcpProducer::closeAllStreams() .../kv_engine/engines/ep/src/dcp/producer.cc:1377 (ep.so+0x000000125c00) Mutex M225878274331574312 previously acquired by the same thread here: #0 pthread_rwlock_wrlock <null> (libtsan.so.0+0x0000000297eb) couchbase#1 cb_rw_writer_enter(pthread_rwlock_t*) .../platform/src/cb_pthreads.cc:217 (libplatform_so.so.0.1.0+0x000000009e80) couchbase#2 cb::RWLock::writerLock() .../platform/include/platform/rwlock.h:103 (ep.so+0x000000125597) couchbase#3 cb::RWLock::lock() .../platform/include/platform/rwlock.h:77 (ep.so+0x000000125597) couchbase#4 std::unique_lock<cb::RWLock>::lock() /usr/local/include/c++/7.3.0/bits/std_mutex.h:267 (ep.so+0x000000125597) couchbase#5 std::unique_lock<cb::RWLock>::unique_lock(cb::RWLock&) /usr/local/include/c++/7.3.0/bits/std_mutex.h:197 (ep.so+0x000000125597) couchbase#6 StreamContainer<std::shared_ptr<Stream> >::WriteLockedHandle::WriteLockedHandle(StreamContainer<std::shared_ptr<Stream> >&) .../kv_engine/engines/ep/src/dcp/stream_container.h:237 (ep.so+0x000000125597) #7 StreamContainer<std::shared_ptr<Stream> >::wlock() .../kv_engine/engines/ep/src/dcp/stream_container.h:277 (ep.so+0x000000125597) #8 operator() .../kv_engine/engines/ep/src/dcp/producer.cc:1381 (ep.so+0x000000125597) #9 for_each<...> /usr/local/include/c++/7.3.0/bits/stl_algo.h:3884 (ep.so+0x000000125597) #10 DcpProducer::closeAllStreams() .../kv_engine/engines/ep/src/dcp/producer.cc:1377 (ep.so+0x000000125c00) Change-Id: Icc15e74e80d7f1926ce6c75bbdd8aa1c43f5ca2c Reviewed-on: http://review.couchbase.org/111989 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Dave Rigby <[email protected]>
trondn
pushed a commit
to trondn/kv_engine
that referenced
this pull request
Jun 18, 2020
As identified by TSan, triggered by http://review.couchbase.org/#/c/112781/ (fix for MB-35361): $ TSAN_OPTIONS=second_deadlock_stack=1 ./ep-engine_ep_unit_tests --gtest_filter=AllBucketTypes/DurabilityPassiveStreamTest.ReceiveMutationInsteadOfCommitForReconnectWindowWithPrepareLast/ephemeral_auto_delete Note: Google Test filter = AllBucketTypes/DurabilityPassiveStreamTest.ReceiveMutationInsteadOfCommitForReconnectWindowWithPrepareLast/ephemeral_auto_delete [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from AllBucketTypes/DurabilityPassiveStreamTest [ RUN ] AllBucketTypes/DurabilityPassiveStreamTest.ReceiveMutationInsteadOfCommitForReconnectWindowWithPrepareLast/ephemeral_auto_delete ================== WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=55122) Cycle in lock order graph: M458376862792418680 (0x000000000000) => M431917940103981768 (0x000000000000) => M456969213031230168 (0x000000000000) => M458376862792418680 Mutex M431917940103981768 acquired here while holding mutex M458376862792418680 in main thread: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005ea8d) ... #10 folly::SynchronizedPtr<...>::rlock() const folly/SynchronizedPtr.h:86 (ep-engine_ep_unit_tests+0x000000ea13b0) #11 PassiveDurabilityMonitor::getHighPreparedSeqno() const kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:115 (ep-engine_ep_unit_tests+0x000000ea13b0) #12 VBucket::getHighPreparedSeqno() const kv_engine/engines/ep/src/vbucket.cc:285 (ep-engine_ep_unit_tests+0x000000f9e1f4) #13 PassiveStream::acceptStream(cb::mcbp::Status, unsigned int) kv_engine/engines/ep/src/dcp/passive_stream.cc:176 (ep-engine_ep_unit_tests+0x000000e5995d) #14 DurabilityPassiveStreamTest::testReceiveMutationOrDeletionInsteadOfCommitForReconnectWindowWithPrepareLast(DocumentState) kv_engine/engines/ep/tests/module_tests/dcp_durability_stream_test.cc:929 (ep-engine_ep_unit_tests+0x0000006c7905) ... Mutex M458376862792418680 previously acquired by the same thread here: #0 pthread_mutex_lock <null> (libtsan.so.0+0x00000003bbbf) ... couchbase#5 PassiveStream::acceptStream(cb::mcbp::Status, unsigned int) kv_engine/engines/ep/src/dcp/passive_stream.cc:162 (ep-engine_ep_unit_tests+0x000000e59789) couchbase#6 DurabilityPassiveStreamTest::testReceiveMutationOrDeletionInsteadOfCommitForReconnectWindowWithPrepareLast(DocumentState) kv_engine/engines/ep/tests/module_tests/dcp_durability_stream_test.cc:929 (ep-engine_ep_unit_tests+0x0000006c7905) ... Mutex M456969213031230168 acquired here while holding mutex M431917940103981768 in main thread: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005ea8d) ... #9 folly::SynchronizedBase<...>::wlock() folly/Synchronized.h:114 (ep-engine_ep_unit_tests+0x000000ea3db0) #10 PassiveDurabilityMonitor::storeSeqnoAck(long, long) kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:244 (ep-engine_ep_unit_tests+0x000000ea3db0) #11 PassiveDurabilityMonitor::notifySnapshotEndReceived(unsigned long) kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:219 (ep-engine_ep_unit_tests+0x000000ea3db0) #12 VBucket::notifyPassiveDMOfSnapEndReceived(unsigned long) kv_engine/engines/ep/src/vbucket.cc:979 (ep-engine_ep_unit_tests+0x000000f9e8b8) #13 PassiveStream::handleSnapshotEnd(std::shared_ptr<VBucket>&, unsigned long) kv_engine/engines/ep/src/dcp/passive_stream.cc:978 (ep-engine_ep_unit_tests+0x000000e51c63) #14 PassiveStream::processMessage(MutationConsumerMessage*, PassiveStream::MessageType) kv_engine/engines/ep/src/dcp/passive_stream.cc:662 (ep-engine_ep_unit_tests+0x000000e55910) #15 PassiveStream::processPrepare(MutationConsumerMessage*) kv_engine/engines/ep/src/dcp/passive_stream.cc:685 (ep-engine_ep_unit_tests+0x000000e55e29) #16 PassiveStream::messageReceived(...) kv_engine/engines/ep/src/dcp/passive_stream.cc:311 (ep-engine_ep_unit_tests+0x000000e59f8c) #17 MockPassiveStream::messageReceived(...) kv_engine/engines/ep/tests/mock/mock_stream.cc:87 (ep-engine_ep_unit_tests+0x0000004587a2) #18 DurabilityPassiveStreamTest::makeAndReceiveDcpPrepare(...) kv_engine/engines/ep/tests/module_tests/dcp_durability_stream_test.cc:1340 (ep-engine_ep_unit_tests+0x0000006bcbda) #19 DurabilityPassiveStreamTest::testReceiveMutationOrDeletionInsteadOfCommitForReconnectWindowWithPrepareLast(DocumentState) kv_engine/engines/ep/tests/module_tests/dcp_durability_stream_test.cc:920 (ep-engine_ep_unit_tests+0x0000006c7748) ... Mutex M431917940103981768 previously acquired by the same thread here: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005ea8d) ... #10 folly::SynchronizedPtr<...>::wlock() folly/SynchronizedPtr.h:95 (ep-engine_ep_unit_tests+0x000000ea3b94) #11 PassiveDurabilityMonitor::notifySnapshotEndReceived(unsigned long) kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:208 (ep-engine_ep_unit_tests+0x000000ea3b94) #12 VBucket::notifyPassiveDMOfSnapEndReceived(unsigned long) kv_engine/engines/ep/src/vbucket.cc:979 (ep-engine_ep_unit_tests+0x000000f9e8b8) #13 PassiveStream::handleSnapshotEnd(std::shared_ptr<VBucket>&, unsigned long) kv_engine/engines/ep/src/dcp/passive_stream.cc:978 (ep-engine_ep_unit_tests+0x000000e51c63) #14 PassiveStream::processMessage(MutationConsumerMessage*, PassiveStream::MessageType) kv_engine/engines/ep/src/dcp/passive_stream.cc:662 (ep-engine_ep_unit_tests+0x000000e55910) #15 PassiveStream::processPrepare(MutationConsumerMessage*) kv_engine/engines/ep/src/dcp/passive_stream.cc:685 (ep-engine_ep_unit_tests+0x000000e55e29) #16 PassiveStream::messageReceived(...) kv_engine/engines/ep/src/dcp/passive_stream.cc:311 (ep-engine_ep_unit_tests+0x000000e59f8c) #17 MockPassiveStream::messageReceived(...) kv_engine/engines/ep/tests/mock/mock_stream.cc:87 (ep-engine_ep_unit_tests+0x0000004587a2) #18 DurabilityPassiveStreamTest::makeAndReceiveDcpPrepare(...) kv_engine/engines/ep/tests/module_tests/dcp_durability_stream_test.cc:1340 (ep-engine_ep_unit_tests+0x0000006bcbda) #19 DurabilityPassiveStreamTest::testReceiveMutationOrDeletionInsteadOfCommitForReconnectWindowWithPrepareLast(DocumentState) kv_engine/engines/ep/tests/module_tests/dcp_durability_stream_test.cc:920 (ep-engine_ep_unit_tests+0x0000006c7748) ... Mutex M458376862792418680 acquired here while holding mutex M456969213031230168 in main thread: #0 pthread_mutex_lock <null> (libtsan.so.0+0x00000003bbbf) ... couchbase#4 PassiveStream::seqnoAck(long) kv_engine/engines/ep/src/dcp/passive_stream.cc:693 (ep-engine_ep_unit_tests+0x000000e51553) couchbase#5 DcpConsumer::seqnoAckStream(Vbid, long) kv_engine/engines/ep/src/dcp/consumer.cc:1067 (ep-engine_ep_unit_tests+0x000000e31c7c) couchbase#6 DcpConnMap::seqnoAckVBPassiveStream(Vbid, long) kv_engine/engines/ep/src/dcp/dcpconnmap.cc:466 (ep-engine_ep_unit_tests+0x000000e3b001) #7 operator() kv_engine/engines/ep/src/kv_bucket.cc:2680 (ep-engine_ep_unit_tests+0x000000f41376) #8 _M_invoke /usr/local/include/c++/7.3.0/bits/std_function.h:316 (ep-engine_ep_unit_tests+0x000000f41376) #9 std::function<void (Vbid, long)>::operator()(Vbid, long) const /usr/local/include/c++/7.3.0/bits/std_function.h:706 (ep-engine_ep_unit_tests+0x000000f9e9aa) #10 VBucket::sendSeqnoAck(long) kv_engine/engines/ep/src/vbucket.cc:984 (ep-engine_ep_unit_tests+0x000000f9e9aa) #11 PassiveDurabilityMonitor::sendSeqnoAck() kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:255 (ep-engine_ep_unit_tests+0x000000ea1d60) #12 PassiveDurabilityMonitor::notifySnapshotEndReceived(unsigned long) kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:226 (ep-engine_ep_unit_tests+0x000000ea3e27) #13 VBucket::notifyPassiveDMOfSnapEndReceived(unsigned long) kv_engine/engines/ep/src/vbucket.cc:979 (ep-engine_ep_unit_tests+0x000000f9e8b8) #14 PassiveStream::handleSnapshotEnd(std::shared_ptr<VBucket>&, unsigned long) kv_engine/engines/ep/src/dcp/passive_stream.cc:978 (ep-engine_ep_unit_tests+0x000000e51c63) #15 PassiveStream::processMessage(MutationConsumerMessage*, PassiveStream::MessageType) kv_engine/engines/ep/src/dcp/passive_stream.cc:662 (ep-engine_ep_unit_tests+0x000000e55910) #16 PassiveStream::processMutation(MutationConsumerMessage*) kv_engine/engines/ep/src/dcp/passive_stream.cc:670 (ep-engine_ep_unit_tests+0x000000e55d32) #17 MockPassiveStream::processMutation(MutationConsumerMessage*) <null> (ep-engine_ep_unit_tests+0x0000004596dc) #18 PassiveStream::messageReceived(...) kv_engine/engines/ep/src/dcp/passive_stream.cc:299 (ep-engine_ep_unit_tests+0x000000e59fdf) #19 MockPassiveStream::messageReceived(...) kv_engine/engines/ep/tests/mock/mock_stream.cc:87 (ep-engine_ep_unit_tests+0x0000004587a2) #20 DurabilityPassiveStreamTest::testReceiveMutationOrDeletionInsteadOfCommitForReconnectWindowWithPrepareLast(DocumentState) kv_engine/engines/ep/tests/module_tests/dcp_durability_stream_test.cc:950 (ep-engine_ep_unit_tests+0x0000006c7dfa) #21 DurabilityPassiveStreamTest_ReceiveMutationInsteadOfCommitForReconnectWindowWithPrepareLast_Test::TestBody() kv_engine/engines/ep/tests/module_tests/dcp_durability_stream_test.cc:1012 (ep-engine_ep_unit_tests+0x0000006c9970) ... Mutex M456969213031230168 previously acquired by the same thread here: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005ea8d) ... #9 folly::SynchronizedBase<...>::wlock() folly/Synchronized.h:114 (ep-engine_ep_unit_tests+0x000000ea1d32) #10 PassiveDurabilityMonitor::sendSeqnoAck() kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:253 (ep-engine_ep_unit_tests+0x000000ea1d32) #11 PassiveDurabilityMonitor::notifySnapshotEndReceived(unsigned long) kv_engine/engines/ep/src/durability/passive_durability_monitor.cc:226 (ep-engine_ep_unit_tests+0x000000ea3e27) #12 VBucket::notifyPassiveDMOfSnapEndReceived(unsigned long) kv_engine/engines/ep/src/vbucket.cc:979 (ep-engine_ep_unit_tests+0x000000f9e8b8) #13 PassiveStream::handleSnapshotEnd(std::shared_ptr<VBucket>&, unsigned long) kv_engine/engines/ep/src/dcp/passive_stream.cc:978 (ep-engine_ep_unit_tests+0x000000e51c63) #14 PassiveStream::processMessage(MutationConsumerMessage*, PassiveStream::MessageType) kv_engine/engines/ep/src/dcp/passive_stream.cc:662 (ep-engine_ep_unit_tests+0x000000e55910) #15 PassiveStream::processMutation(MutationConsumerMessage*) kv_engine/engines/ep/src/dcp/passive_stream.cc:670 (ep-engine_ep_unit_tests+0x000000e55d32) #16 MockPassiveStream::processMutation(MutationConsumerMessage*) <null> (ep-engine_ep_unit_tests+0x0000004596dc) #17 PassiveStream::messageReceived(...) kv_engine/engines/ep/src/dcp/passive_stream.cc:299 (ep-engine_ep_unit_tests+0x000000e59fdf) #18 MockPassiveStream::messageReceived(...) kv_engine/engines/ep/tests/mock/mock_stream.cc:87 (ep-engine_ep_unit_tests+0x0000004587a2) #19 DurabilityPassiveStreamTest::testReceiveMutationOrDeletionInsteadOfCommitForReconnectWindowWithPrepareLast(DocumentState) kv_engine/engines/ep/tests/module_tests/dcp_durability_stream_test.cc:950 (ep-engine_ep_unit_tests+0x0000006c7dfa) ... Change-Id: I9a672be7c04ff2993f524ccd961ddd40ef004c17 Reviewed-on: http://review.couchbase.org/112908 Reviewed-by: James Harrison <[email protected]> Reviewed-by: Ben Huddleston <[email protected]> Tested-by: Build Bot <[email protected]>
trondn
pushed a commit
to trondn/kv_engine
that referenced
this pull request
Jun 18, 2020
Fix data race in Timings::get_or_create_timing_histogram() to prevent the data race were in one thread we might not observe correctly the pointer to a given Hdr1sfMicroSecHistogram stored by the timings array. Which could lead to the Hdr1sfMicroSecHistogram being allocated multiple times. WARNING: ThreadSanitizer: data race (pid=1085) Read of size 8 at 0x0000009bad18 by thread T7 (mutexes: write M2377): #0 std::__uniq_ptr_impl<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >::_M_ptr() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:147 (memcached+0x0000004bf317) couchbase#1 std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >::get() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:337 (memcached+0x0000004bf317) couchbase#2 std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >::operator bool() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:351 (memcached+0x0000004bf317) couchbase#3 bool std::operator==<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >(std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> > const&, decltype(nullptr)) /usr/local/include/c++/7.3.0/bits/unique_ptr.h:690 (memcached+0x0000004bf317) couchbase#4 Timings::get_or_create_timing_histogram(unsigned char) /home/couchbase/couchbase/kv_engine/daemon/timings.cc:146 (memcached+0x0000004bf317) couchbase#5 Timings::collect(cb::mcbp::ClientOpcode, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) /home/couchbase/couchbase/kv_engine/daemon/timings.cc:42 (memcached+0x0000004bf4aa) ... Previous write of size 8 at 0x0000009bad18 by thread T8 (mutexes: write M2371, write M2392): #0 std::enable_if<std::__and_<std::__not_<std::__is_tuple_like<Hdr1sfMicroSecHistogram*> >, std::is_move_constructible<Hdr1sfMicroSecHistogram*>, std::is_move_assignable<Hdr1sfMicroSecHistogram*> >::value, void>::type std::swap<Hdr1sfMicroSecHistogram*>(Hdr1sfMicroSecHistogram*&, Hdr1sfMicroSecHistogram*&) /usr/local/include/c++/7.3.0/bits/move.h:199 (memcached+0x0000004bf3cf) couchbase#1 std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >::reset(Hdr1sfMicroSecHistogram*) /usr/local/include/c++/7.3.0/bits/unique_ptr.h:374 (memcached+0x0000004bf3cf) couchbase#2 std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >::operator=(std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >&&) /usr/local/include/c++/7.3.0/bits/unique_ptr.h:283 (memcached+0x0000004bf3cf) couchbase#3 Timings::get_or_create_timing_histogram(unsigned char) /home/couchbase/couchbase/kv_engine/daemon/timings.cc:149 (memcached+0x0000004bf3cf) couchbase#4 Timings::collect(cb::mcbp::ClientOpcode, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) /home/couchbase/couchbase/kv_engine/daemon/timings.cc:42 (memcached+0x0000004bf4aa) ... Change-Id: I34db63854a1909cbf43a9feb273a13dfa40f313d Reviewed-on: http://review.couchbase.org/113024 Reviewed-by: James Harrison <[email protected]> Tested-by: Build Bot <[email protected]> Reviewed-by: Dave Rigby <[email protected]>
trondn
pushed a commit
to trondn/kv_engine
that referenced
this pull request
Jun 18, 2020
Change how SyncWrites which are Resolved and awaiting Completion are handled, by moving the final VBucket::commit() / abort() into a background task - DurabilityCompletionTask. +Background+ There are two reasons for making this change: a) Performance - specifically latency of front-end worker threads. By moving completion into a background task, we reduce the amount of work done on the thread which actually detected the SyncWrite was resolved - typically the front-end DCP threads when a DCP_SEQNO_ACK is processed. Given that we SEQNO_ACK at the end of Snapshot, A single SEQNO_ACK could result in committing multiple SyncWrites. Committing one SyncWrite is similar to a normal front-end Set operation, so there is potentially a non-trivial amount of work needed to be done when completing SyncWrites, which could tie up the front-end thread (causing other Connections to have to wait) for a noticable amount of time. b) Simplification of lock management. Doing completion in a background task simplifies lock management, for example we avoid lock inversions with earlier locks acquired during dcpSeqnoAck when attemping to later call notifySeqnoAvailable when this was done on the original thread. +Problem+ While (a) was the first reason identified for making this change (see MB-33092), (b) is the reason this change is being made now. During testing the following lock-order-inversion was seen: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) Cycle in lock order graph: Stream::streamMutex => StreamContainer::rwlock => Stream::streamMutex The crux of the issue is the processing of DCP_SEQNO_ACKNOWLEDGED messages by the DcpProducer - this acquires the Stream::streamMutex before calling VBucket::seqnoAcknowledged(), however that function currently results in VBucket::commit() being called to synchronously complete the SyncWrite; which in turn must nodify all connected replica that a new seqno is available, requiring StreamContainer::rwlock to be acquired: Mutex StreamContainer::rwlock acquired here while holding mutex Stream::streamMutex in thread T15: ... couchbase#6 StreamContainer<std::shared_ptr<Stream> >::rlock() #7 DcpProducer::notifySeqnoAvailable(Vbid, unsigned long) ... #13 VBucket::commit(...) #14 ActiveDurabilityMonitor::commit(...) #15 ActiveDurabilityMonitor::processCompletedSyncWriteQueue() #16 ActiveDurabilityMonitor::seqnoAckReceived(...) #17 VBucket::seqnoAcknowledged(...) #18 ActiveStream::seqnoAck(...) #19 DcpProducer::seqno_acknowledged(...) ... Mutex Stream::streamMutex previously acquired by the same thread here: ... couchbase#3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) couchbase#4 ActiveStream::seqnoAck(...) couchbase#5 DcpProducer::seqno_acknowledged(...) ... This conflicts with the ordering seen when sending items out on the DCP connection - inside DcpProducer::step() where the StreamContainer::rwlock is acquired first, then ActiveStream::mutex acquired later: Mutex Stream::streamMutex acquired here while holding mutex StreamContainer::rwlock in thread T15: ... couchbase#3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) couchbase#4 ActiveStream::next() couchbase#5 DcpProducer::getNextItem() couchbase#6 DcpProducer::step(dcp_message_producers*) ... Mutex StreamContainer::rwlock previously acquired by the same thread here: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002c98b) ... couchbase#4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) couchbase#5 StreamContainer<>::ResumableIterationHandle::ResumableIterationHandle() couchbase#6 StreamContainer<>::startResumable() #7 DcpProducer::getNextItem() #8 DcpProducer::step(dcp_message_producers*) ... +Solution+ The processing of resolved SyncWrites moved into a new background task. Instead of immediately processing them within ActiveDM::seqnoAckReceived(), that function notifies the new NonIO DurabilityCompletionTask that there are SyncWrites waiting for completion. DurabilityCompletionTask maintains a bool per vBucket indicating if there are SyncWrites for that vBucket pending completion. When the task is run, for each flag which is true it calls VBucket::processResolvedSyncWrites() for the associated VBucket. +Implementaiton Notes+ Currently there is just a single DurabilityCompletionTask (per Bucket), this was chosen as 1 task per vBucket (i.e. 1024 per Bucket) would be inefficient for our current background task scheduler (both in terms of latency to schedule each task for only one vBucket's worth of work, and in terms of managing that many tasks in the future queue). However, that does _potentially_ mean there's fewer resources (threads) available to complete SyncWrites on - previously that work could be done concurrently on all frontend threads (~O(num_cpus). Now the same work only has 1 thread available to run on (there's only a single DurabilityCompletionTask). _If_ this becomes a bottleneck we could look at increasing the number of DurabilityCompletionTask - e.g. sharding all vBuckets across multiple tasks like flusher / bgfetcher. Change-Id: I87897af1e3fd0a57e5abc2cb1ba9f795a9d3c63e Reviewed-on: http://review.couchbase.org/113141 Tested-by: Build Bot <[email protected]> Reviewed-by: Ben Huddleston <[email protected]>
trondn
pushed a commit
to trondn/kv_engine
that referenced
this pull request
Jun 18, 2020
The TSAN output (below) is saying that we have unlocked reads of the Connection::datatype 'bitset' from various threads, when the bitset was written with the frontend thread lock held. It is now common for non front-end threads to query a connection's permitted data-types, e.g. TSAN shows the backfill task deciding what to do about snappy. Changing the datatype to be std::atomic allows for correct write read between threads. ThreadSanitizer: data race/usr/local/include/c++/7.3.0/bitset:433std::_Base_bitset<1ul>::_M_do_or(std::_Base_bitset<1ul> const&) Write of size 8 at 0x7b5c00040f60 by thread T6 (mutexes: write M35842): #0 std::_Base_bitset<1ul>::_M_do_or(std::_Base_bitset<1ul> const&) /usr/local/include/c++/7.3.0/bitset:433 (memcached+0x0000004f1ec2) couchbase#1 std::bitset<8ul>::operator|=(std::bitset<8ul> const&) /usr/local/include/c++/7.3.0/bitset:973 (memcached+0x0000004f1ec2) couchbase#2 Datatype::enable(cb::mcbp::Feature) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/datatype.cc:57 (memcached+0x0000004f1ec2) couchbase#3 Connection::enableDatatype(cb::mcbp::Feature) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connection.h:653 (memcached+0x000000507ef6) couchbase#4 process_bin_dcp_response /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/mcbp_executors.cc:573 (memcached+0x000000507ef6) couchbase#5 std::_Function_handler::_M_invoke(std::_Any_data const&, Cookie&) /usr/local/include/c++/7.3.0/bits/std_function.h:316 (memcached+0x000000509c92) couchbase#6 std::function::operator()(Cookie&) const /usr/local/include/c++/7.3.0/bits/std_function.h:706 (memcached+0x000000508929) #7 execute_client_response_packet /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/mcbp_executors.cc:897 (memcached+0x000000508929) #8 execute_response_packet(Cookie&, cb::mcbp::Response const&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/mcbp_executors.cc:946 (memcached+0x000000508929) #9 Cookie::execute() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/cookie.cc:120 (memcached+0x0000004eb86e) #10 StateMachine::conn_execute() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/statemachine.cc:410 (memcached+0x00000053d4fc) #11 StateMachine::execute() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/statemachine.cc:137 (memcached+0x00000053f837) #12 Connection::runStateMachinery() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connection.cc:1340 (memcached+0x0000004d46d7) #13 Connection::runEventLoop(short) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connection.cc:1414 (memcached+0x0000004d476e) #14 run_event_loop(Connection*, short) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connections.cc:148 (memcached+0x0000004e9cbb) #15 event_handler(int, short, void*) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/memcached.cc:848 (memcached+0x00000043c50d) #16 event_persist_closure /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1580 (libevent_core.so.2.1.8+0x000000017086) #17 event_process_active_single_queue /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1639 (libevent_core.so.2.1.8+0x000000017086) #18 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #19 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #20 (libtsan.so.0+0x000000024feb) Previous read of size 8 at 0x7b5c00040f60 by thread T14 (mutexes: write M1073680921370941928): #0 std::bitset<8ul> std::operator&<8ul>(std::bitset<8ul> const&, std::bitset<8ul> const&) /usr/local/include/c++/7.3.0/bitset:1427 (memcached+0x0000004f1c7e) couchbase#1 Datatype::isEnabled(unsigned char) const /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/datatype.cc:101 (memcached+0x0000004f1c7e) couchbase#2 Connection::isDatatypeEnabled(unsigned char) const /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connection.h:679 (memcached+0x000000443b85) couchbase#3 ServerCookieApi::is_datatype_supported(gsl::not_null, unsigned char) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/memcached.cc:1543 (memcached+0x000000443b85) couchbase#4 EventuallyPersistentEngine::isDatatypeSupported(void const*, unsigned char) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/ep_engine.cc:1831 (ep.so+0x00000017861d) couchbase#5 DcpProducer::isCompressionEnabled() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/producer.h:175 (ep.so+0x0000000a8ab2) couchbase#6 ActiveStream::isCompressionEnabled() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/active_stream.cc:606 (ep.so+0x0000000a8ab2) #7 DCPBackfillDisk::create() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/backfill_disk.cc:208 (ep.so+0x0000000cb7e0) #8 DCPBackfillDisk::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/backfill_disk.cc:155 (ep.so+0x0000000ccb1f) #9 BackfillManager::backfill() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/backfill-manager.cc:313 (ep.so+0x0000000c9005) #10 BackfillManagerTask::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/backfill-manager.cc:74 (ep.so+0x0000000c9571) #11 ExecutorThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorthread.cc:153 (ep.so+0x0000001d5db8) #12 launch_executor_thread /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorthread.cc:34 (ep.so+0x0000001d6e95) #13 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #14 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #15 (libtsan.so.0+0x000000024feb) Location is heap block of size 880 at 0x7b5c00040c00 allocated by thread T6: #0 operator new(unsigned long) (libtsan.so.0+0x00000006a4d6) couchbase#1 allocate_connection /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connections.cc:256 (memcached+0x0000004ea61a) couchbase#2 conn_new(int, ListeningPort const&, event_base*, FrontEndThread&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connections.cc:165 (memcached+0x0000004ea61a) couchbase#3 dispatch_new_connections /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/thread.cc:251 (memcached+0x0000004a3cb5) couchbase#4 thread_libevent_process /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/thread.cc:292 (memcached+0x0000004a3cb5) couchbase#5 event_persist_closure /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1580 (libevent_core.so.2.1.8+0x000000017086) couchbase#6 event_process_active_single_queue /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1639 (libevent_core.so.2.1.8+0x000000017086) #7 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #8 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #9 (libtsan.so.0+0x000000024feb) Mutex M35842 (0x7b780004f8d8) created at: #0 pthread_mutex_lock (libtsan.so.0+0x00000003876f) couchbase#1 __gthread_mutex_lock /usr/local/include/c++/7.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748 (memcached+0x0000004a4371) couchbase#2 std::mutex::lock() /usr/local/include/c++/7.3.0/bits/std_mutex.h:103 (memcached+0x0000004a4371) couchbase#3 phosphor::MutexEventGuard::MutexEventGuard(phosphor::tracepoint_info const*, phosphor::tracepoint_info const*, bool, std::mutex&, std::chrono::duration >) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/phosphor/include/phosphor/scoped_event_guard.h:93 (memcached+0x0000004a4371) couchbase#4 thread_libevent_process /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/thread.cc:300 (memcached+0x0000004a4371) couchbase#5 event_persist_closure /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1580 (libevent_core.so.2.1.8+0x000000017086) couchbase#6 event_process_active_single_queue /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1639 (libevent_core.so.2.1.8+0x000000017086) #7 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #8 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #9 (libtsan.so.0+0x000000024feb) Mutex M1073680921370941928 is already destroyed. Thread T6 'mc:worker_0' (tid=6945, running) created by main thread at: #0 pthread_create (libtsan.so.0+0x0000000282a0) couchbase#1 cb_create_named_thread(unsigned long*, void (*)(void*), void*, int, char const*) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:109 (libplatform_so.so.0.1.0+0x00000000995b) couchbase#2 create_worker /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/thread.cc:111 (memcached+0x0000004a510d) couchbase#3 thread_init(unsigned long, event_base*, void (*)(int, short, void*)) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/thread.cc:460 (memcached+0x0000004a510d) couchbase#4 memcached_main /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/memcached.cc:2457 (memcached+0x00000043ebdf) couchbase#5 main /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/main.cc:33 (memcached+0x00000042a1ee) Thread T14 'mc:auxIO_0' (tid=7362, running) created by thread T21 at: #0 pthread_create (libtsan.so.0+0x0000000282a0) couchbase#1 cb_create_named_thread(unsigned long*, void (*)(void*), void*, int, char const*) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:109 (libplatform_so.so.0.1.0+0x00000000995b) couchbase#2 ExecutorThread::start() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorthread.cc:51 (ep.so+0x0000001d54b2) couchbase#3 ExecutorPool::_adjustWorkers(task_type_t, unsigned long) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorpool.cc:527 (ep.so+0x0000001cb9a2) couchbase#4 ExecutorPool::_startWorkers() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorpool.cc:597 (ep.so+0x0000001cc535) couchbase#5 ExecutorPool::_registerTaskable(Taskable&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorpool.cc:483 (ep.so+0x0000001cb1f8) couchbase#6 ExecutorPool::registerTaskable(Taskable&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorpool.cc:488 (ep.so+0x0000001cb55f) #7 KVBucket::KVBucket(EventuallyPersistentEngine&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/kv_bucket.cc:315 (ep.so+0x000000214248) #8 EPBucket::EPBucket(EventuallyPersistentEngine&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/ep_bucket.cc:251 (ep.so+0x00000015c99b) #9 std::_MakeUniq::__single_object std::make_unique(EventuallyPersistentEngine&) /usr/local/include/c++/7.3.0/bits/unique_ptr.h:825 (ep.so+0x00000017bf75) #10 EventuallyPersistentEngine::makeBucket(Configuration&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/ep_engine.cc:6100 (ep.so+0x00000017bf75) #11 EventuallyPersistentEngine::initialize(char const*) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/ep_engine.cc:2033 (ep.so+0x000000198dd0) #12 CreateBucketThread::create() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/memcached.cc:1867 (memcached+0x0000004361e2) #13 CreateBucketThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/memcached.cc:1910 (memcached+0x000000436a66) #14 Couchbase::Thread::thread_entry() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/thread.cc:45 (libplatform_so.so.0.1.0+0x00000001cb0a) #15 Couchbase::StartThreadDelegator::run(Couchbase::Thread&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/thread.cc:59 (libplatform_so.so.0.1.0+0x00000001cc05) #16 task_thread_main /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/thread.cc:65 (libplatform_so.so.0.1.0+0x00000001cc05) #17 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #18 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #19 (libtsan.so.0+0x000000024feb) Change-Id: I5dce6961174eaa55d092136b328f5252add0d073 Reviewed-on: http://review.couchbase.org/113508 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
trondn
pushed a commit
to trondn/kv_engine
that referenced
this pull request
Jun 18, 2020
When building on macOS with clang 8 and ThreadSanitizer enabled, memcached_testapp crashes when destructing the Settings object: libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument With the following backtrace from where the system_error exception is thrown: * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x00007fff7b6411f6 libc++abi.dylib` __cxa_throw frame couchbase#1: 0x00007fff7b6147af libc++.1.dylib` std::__1::__throw_system_error(int, char const*) + 77 frame couchbase#2: 0x00007fff7b606c93 libc++.1.dylib` std::__1::mutex::lock() + 29 frame couchbase#3: 0x00000001009c27ff memcached_testapp` folly::SharedMutexImpl<...>::annotateLazyCreate() [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock(__m=<unavailable>) + 191 at __mutex_base:131 frame couchbase#4: 0x00000001009c27f7 memcached_testapp` folly::SharedMutexImpl<...>::annotateLazyCreate() [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock(__m=<unavailable>) at __mutex_base:131 * frame couchbase#5: 0x00000001009c27f7 memcached_testapp` folly::SharedMutexImpl<...>::annotateLazyCreate() + 140 at SharedMutex.cpp:33 frame couchbase#6: 0x00000001009c276b memcached_testapp` folly::SharedMutexImpl<...>::annotateLazyCreate(this=0x0000000100c7ddc8) + 43 at SharedMutex.h:705 frame #7: 0x00000001009c191a memcached_testapp` folly::SharedMutexImpl<...>::~SharedMutexImpl() [inlined] folly::SharedMutexImpl<...>::annotateDestroy(this=<unavailable>) + 8 at SharedMutex.h:718 frame #8: 0x00000001009c1912 memcached_testapp` folly::SharedMutexImpl<...>::~SharedMutexImpl() + 24 at SharedMutex.h:324 frame #9: 0x00000001009c18fa memcached_testapp` folly::SharedMutexImpl<...>::~SharedMutexImpl(this=0x0000000100c7ddc8) + 26 at SharedMutex.h:300 frame #10: 0x000000010076b837 memcached_testapp` folly::Synchronized<...>::~Synchronized(this=0x0000000100c7ddb0) + 55 at Synchronized.h:484 frame #11: 0x000000010075d1d9 memcached_testapp` folly::Synchronized<...>::~Synchronized(this=0x0000000100c7ddb0) + 41 at Synchronized.h:484 frame #12: 0x000000010075e8fc memcached_testapp` Settings::~Settings(this=0x0000000100c7db10) + 76 at settings.h:51 frame #13: 0x000000010075c8b9 memcached_testapp` Settings::~Settings(this=0x0000000100c7db10) + 41 at settings.h:51 frame #14: 0x0000000102b7d5c1 libclang_rt.tsan_osx_dynamic.dylib` cxa_at_exit_wrapper(void*) + 33 frame #15: 0x00007fff7d72beed libsystem_c.dylib` __cxa_finalize_ranges + 351 frame #16: 0x00007fff7d72c1fe libsystem_c.dylib` exit + 55 frame #17: 0x00007fff7d67f01c libdyld.dylib` start + 8 The problem is at frame 5, where as part of the destruction of SharedMutex (as used by a member variable of Settings) we are attempting to acquire a mutex which has already been destructed (as it is itself a function-local static) - i.e. we have encountered the static initialization order fiasco :( Address this by changing `settings` to no longer be a plain static (global) variable, and instead be created on first use via a new Settings::instance() static method. Change-Id: I40bd44ed0ae32eb55271ce739fdf990d9869c32f Reviewed-on: http://review.couchbase.org/113640 Tested-by: Build Bot <[email protected]> Reviewed-by: Jim Walker <[email protected]>
trondn
pushed a commit
to trondn/kv_engine
that referenced
this pull request
Jun 18, 2020
[[Re-apply after fixing error in DurabilityCompletionTask::run (skipping last vBucket).]] Change how SyncWrites which are Resolved and awaiting Completion are handled, by moving the final VBucket::commit() / abort() into a background task - DurabilityCompletionTask. +Background+ There are two reasons for making this change: a) Performance - specifically latency of front-end worker threads. By moving completion into a background task, we reduce the amount of work done on the thread which actually detected the SyncWrite was resolved - typically the front-end DCP threads when a DCP_SEQNO_ACK is processed. Given that we SEQNO_ACK at the end of Snapshot, A single SEQNO_ACK could result in committing multiple SyncWrites. Committing one SyncWrite is similar to a normal front-end Set operation, so there is potentially a non-trivial amount of work needed to be done when completing SyncWrites, which could tie up the front-end thread (causing other Connections to have to wait) for a noticable amount of time. b) Simplification of lock management. Doing completion in a background task simplifies lock management, for example we avoid lock inversions with earlier locks acquired during dcpSeqnoAck when attemping to later call notifySeqnoAvailable when this was done on the original thread. +Problem+ While (a) was the first reason identified for making this change (see MB-33092), (b) is the reason this change is being made now. During testing the following lock-order-inversion was seen: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) Cycle in lock order graph: Stream::streamMutex => StreamContainer::rwlock => Stream::streamMutex The crux of the issue is the processing of DCP_SEQNO_ACKNOWLEDGED messages by the DcpProducer - this acquires the Stream::streamMutex before calling VBucket::seqnoAcknowledged(), however that function currently results in VBucket::commit() being called to synchronously complete the SyncWrite; which in turn must nodify all connected replica that a new seqno is available, requiring StreamContainer::rwlock to be acquired: Mutex StreamContainer::rwlock acquired here while holding mutex Stream::streamMutex in thread T15: ... couchbase#6 StreamContainer<std::shared_ptr<Stream> >::rlock() #7 DcpProducer::notifySeqnoAvailable(Vbid, unsigned long) ... #13 VBucket::commit(...) #14 ActiveDurabilityMonitor::commit(...) #15 ActiveDurabilityMonitor::processCompletedSyncWriteQueue() #16 ActiveDurabilityMonitor::seqnoAckReceived(...) #17 VBucket::seqnoAcknowledged(...) #18 ActiveStream::seqnoAck(...) #19 DcpProducer::seqno_acknowledged(...) ... Mutex Stream::streamMutex previously acquired by the same thread here: ... couchbase#3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) couchbase#4 ActiveStream::seqnoAck(...) couchbase#5 DcpProducer::seqno_acknowledged(...) ... This conflicts with the ordering seen when sending items out on the DCP connection - inside DcpProducer::step() where the StreamContainer::rwlock is acquired first, then ActiveStream::mutex acquired later: Mutex Stream::streamMutex acquired here while holding mutex StreamContainer::rwlock in thread T15: ... couchbase#3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) couchbase#4 ActiveStream::next() couchbase#5 DcpProducer::getNextItem() couchbase#6 DcpProducer::step(dcp_message_producers*) ... Mutex StreamContainer::rwlock previously acquired by the same thread here: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002c98b) ... couchbase#4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) couchbase#5 StreamContainer<>::ResumableIterationHandle::ResumableIterationHandle() couchbase#6 StreamContainer<>::startResumable() #7 DcpProducer::getNextItem() #8 DcpProducer::step(dcp_message_producers*) ... +Solution+ The processing of resolved SyncWrites moved into a new background task. Instead of immediately processing them within ActiveDM::seqnoAckReceived(), that function notifies the new NonIO DurabilityCompletionTask that there are SyncWrites waiting for completion. DurabilityCompletionTask maintains a bool per vBucket indicating if there are SyncWrites for that vBucket pending completion. When the task is run, for each flag which is true it calls VBucket::processResolvedSyncWrites() for the associated VBucket. +Implementation Notes+ Currently there is just a single DurabilityCompletionTask (per Bucket), this was chosen as 1 task per vBucket (i.e. 1024 per Bucket) would be inefficient for our current background task scheduler (both in terms of latency to schedule each task for only one vBucket's worth of work, and in terms of managing that many tasks in the future queue). However, that does _potentially_ mean there's fewer resources (threads) available to complete SyncWrites on - previously that work could be done concurrently on all frontend threads (~O(num_cpus). Now the same work only has 1 thread available to run on (there's only a single DurabilityCompletionTask). _If_ this becomes a bottleneck we could look at increasing the number of DurabilityCompletionTask - e.g. sharding all vBuckets across multiple tasks like flusher / bgfetcher. Change-Id: I33ecfa78b03b4d2120b5d05f54984b24ce038fd8 Reviewed-on: http://review.couchbase.org/113749 Reviewed-by: Ben Huddleston <[email protected]> Tested-by: Build Bot <[email protected]>
trondn
pushed a commit
to trondn/kv_engine
that referenced
this pull request
Jun 18, 2020
The audit daemon handle (auditHandle) does not have any synchronization around it, resulting in TSan warnings during shutdown as the Audit object may have been deleted: ThreadSanitizer: data race Write of size 8 at 0x0000009b0130 by main thread: #0 std::swap(cb::audit::Audit*&, cb::audit::Audit*&) /usr/local/include/c++/7.3.0/bits/move.h:199 (memcached+0x0000004f335a) couchbase#1 std::unique_ptr >::reset(cb::audit::Audit*) /usr/local/include/c++/7.3.0/bits/unique_ptr.h:374 (memcached+0x0000004f335a) couchbase#2 shutdown_audit() kv_engine/daemon/mcaudit.cc:348 (memcached+0x0000004f335a) couchbase#3 memcached_main kv_engine/daemon/memcached.cc:2503 (memcached+0x000000435a46) couchbase#4 main kv_engine/daemon/main.cc:33 (memcached+0x00000042145e) Previous read of size 8 at 0x0000009b0130 by thread T8 (mutexes: write M1049192993527234104): #0 std::__uniq_ptr_impl >::_M_ptr() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:147 (memcached+0x0000004f3433) couchbase#1 std::unique_ptr >::get() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:337 (memcached+0x0000004f3433) couchbase#2 std::unique_ptr >::operator->() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:331 (memcached+0x0000004f3433) couchbase#3 stats_audit(std::function)> const&, Cookie&) kv_engine/daemon/mcaudit.cc:361 (memcached+0x0000004f3433) couchbase#4 stat_audit_executor kv_engine/daemon/protocol/mcbp/stats_context.cc:446 (memcached+0x000000528f72) Location is global 'auditHandle' of size 8 at 0x0000009b0130 (memcached+0x0000009b0130) Fix by using folly::Synchronized<> for the auditHandle. Note that exclusive access is only needed during initialization & shutdown, so there should be no additional contention for actually adding audit events (where shared access is sufficient). Change-Id: I34966f08674c6363fde4592b5c4bede48747fb2f Reviewed-on: http://review.couchbase.org/114945 Tested-by: Build Bot <[email protected]> Reviewed-by: Richard de Mellow <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
trondn
pushed a commit
to trondn/kv_engine
that referenced
this pull request
Jun 18, 2020
As reported by TSan, Connection::priority is read and written without a lock: hreadSanitizer: data race third_party/gsl-lite/include/gsl/gsl-lite.h:472gsl::fail_fast_assert(bool, char const*) Read of size 1 at 0x7b5c00019738 by thread T23 (mutexes: write M2903806): #0 gsl::fail_fast_assert(bool, char const*) third_party/gsl-lite/include/gsl/gsl-lite.h:472 (memcached+0x000000465448) couchbase#1 gsl::not_null::get() const third_party/gsl-lite/include/gsl/gsl-lite.h:888 (memcached+0x000000465448) couchbase#2 ServerCookieApi::get_priority(gsl::not_null) kv_engine/daemon/memcached.cc:1640 (memcached+0x000000465448) couchbase#3 EventuallyPersistentEngine::getDCPPriority(void const*) kv_engine/engines/ep/src/ep_engine.cc:5828 (ep.so+0x000000171c57) couchbase#4 ConnHandler::addStats(std::function)> const&, void const*) kv_engine/engines/ep/src/connhandler.cc:327 (ep.so+0x000000092b33) couchbase#5 DcpProducer::addStats(std::function)> const&, void const*) kv_engine/engines/ep/src/dcp/producer.cc:1252 (ep.so+0x000000119b1e) couchbase#6 ConnStatBuilder::operator()(std::shared_ptr) kv_engine/engines/ep/src/ep_engine.cc:3571 (ep.so+0x000000191fbf) #7 void DcpConnMap::each(ConnStatBuilder) kv_engine/engines/ep/src/dcp/dcpconnmap.h:164 (ep.so+0x000000191fbf) #8 EventuallyPersistentEngine::doDcpStats(void const*, std::function)> const&, cb::const_char_buffer) kv_engine/engines/ep/src/ep_engine.cc:3728 (ep.so+0x000000191fbf) #9 EventuallyPersistentEngine::getStats(void const*, cb::const_char_buffer, cb::const_char_buffer, std::function)> const&) kv_engine/engines/ep/src/ep_engine.cc:4398 (ep.so+0x000000192bb8) #10 KVBucket::snapshotStats() kv_engine/engines/ep/src/kv_bucket.cc:1149 (ep.so+0x0000002116db) #11 StatSnap::run() kv_engine/engines/ep/src/tasks.cc:72 (ep.so+0x000000258ae2) #12 ExecutorThread::run() kv_engine/engines/ep/src/executorthread.cc:153 (ep.so+0x0000001d0405) ... Previous write of size 1 at 0x7b5c00019738 by thread T7 (mutexes: write M1036245144598543240): #0 Connection::setPriority(Connection::Priority) kv_engine/daemon/connection.cc:1593 (memcached+0x0000004be2f2) couchbase#1 ServerCookieApi::set_priority(gsl::not_null, CONN_PRIORITY) kv_engine/daemon/memcached.cc:1618 (memcached+0x0000004658b9) couchbase#2 EventuallyPersistentEngine::setDCPPriority(void const*, CONN_PRIORITY) kv_engine/engines/ep/src/ep_engine.cc:5835 (ep.so+0x000000171d5f) couchbase#3 DcpProducer::control(unsigned int, cb::const_char_buffer, cb::const_char_buffer) kv_engine/engines/ep/src/dcp/producer.cc:945 (ep.so+0x000000118cc3) couchbase#4 non-virtual thunk to EventuallyPersistentEngine::control(gsl::not_null, unsigned int, cb::const_char_buffer, cb::const_char_buffer) (ep.so+0x00000018f966) couchbase#5 dcpControl(Cookie&, unsigned int, cb::const_char_buffer, cb::const_char_buffer) kv_engine/daemon/protocol/mcbp/engine_wrapper.cc:408 (memcached+0x00000047c981) couchbase#6 dcp_control_executor(Cookie&) kv_engine/daemon/protocol/mcbp/dcp_control_executor.cc:38 (memcached+0x000000513421) #7 std::_Function_handler::_M_invoke(std::_Any_data const&, Cookie&) /usr/local/include/c++/7.3.0/bits/std_function.h:316 (memcached+0x000000503202) #8 std::function::operator()(Cookie&) const /usr/local/include/c++/7.3.0/bits/std_function.h:706 (memcached+0x0000005019fe) #9 execute_client_request_packet(Cookie&, cb::mcbp::Request const&) kv_engine/daemon/mcbp_executors.cc:857 (memcached+0x0000005019fe) #10 execute_request_packet(Cookie&, cb::mcbp::Request const&) kv_engine/daemon/mcbp_executors.cc:881 (memcached+0x000000501bd7) ... Fix by changing Connection::priority to be an atomic. Change-Id: Ia6bae64ec09e1963e55e7cdb614fb17a68ff1726 Reviewed-on: http://review.couchbase.org/114956 Reviewed-by: Trond Norbye <[email protected]> Tested-by: Build Bot <[email protected]>
trondn
pushed a commit
to trondn/kv_engine
that referenced
this pull request
Jun 18, 2020
As reported by TSan, VBucket::purge_seqno is read and written without a lock: Write of size 8 at 0x7b6800107728 by thread T41: #0 Monotonic::operator=(unsigned long const&) kv_engine/engines/ep/src/monotonic.h:134 (ep.so+0x0000001be09e) couchbase#1 VBucket::setPurgeSeqno(unsigned long) kv_engine/engines/ep/src/vbucket.h:218 (ep.so+0x0000001be09e) couchbase#2 EphemeralVBucket::purgeStaleItems(std::function) kv_engine/engines/ep/src/ephemeral_vb.cc:350 (ep.so+0x0000001be09e) couchbase#3 EphemeralVBucket::StaleItemDeleter::visit(VBucket&) kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:205 (ep.so+0x0000001b7571) couchbase#4 KVBucket::pauseResumeVisit(PauseResumeVBVisitor&, KVBucketIface::Position&) kv_engine/engines/ep/src/kv_bucket.cc:2278 (ep.so+0x000000209a09) couchbase#5 EphTombstoneStaleItemDeleter::run() kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:273 (ep.so+0x0000001b6d59) couchbase#6 ExecutorThread::run() kv_engine/engines/ep/src/executorthread.cc:153 (ep.so+0x0000001d0405) #7 launch_executor_thread kv_engine/engines/ep/src/executorthread.cc:34 (ep.so+0x0000001d1345) #8 CouchbaseThread::run() platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #9 platform_thread_wrap platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #10 (libtsan.so.0+0x000000024feb) Previous read of size 8 at 0x7b6800107728 by thread T6 (mutexes: write M1057918717805263064): #0 VBucket::_addStats(bool, std::function)> const&, void const*) kv_engine/engines/ep/src/vbucket.cc:3010 (ep.so+0x0000002796b6) couchbase#1 EphemeralVBucket::addStats(bool, std::function)> const&, void const*) kv_engine/engines/ep/src/ephemeral_vb.cc:166 (ep.so+0x0000001b8824) couchbase#2 addVBStats kv_engine/engines/ep/src/ep_engine.cc:3155 (ep.so+0x000000183e93) couchbase#3 visitBucket kv_engine/engines/ep/src/ep_engine.cc:3127 (ep.so+0x000000183e93) couchbase#4 KVBucket::visit(VBucketVisitor&) kv_engine/engines/ep/src/kv_bucket.cc:2254 (ep.so+0x000000209efd) couchbase#5 EventuallyPersistentEngine::doVBucketStats(void const*, std::function)> const&, char const*, int, bool, bool) kv_engine/engines/ep/src/ep_engine.cc:3193 (ep.so+0x000000183a7b) ... Fix by changing to an Atomic WeaklyMonotonic type. Change-Id: I014242268f913d31fdc0964c42f59aa952607ba4 Reviewed-on: http://review.couchbase.org/114950 Reviewed-by: James Harrison <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Aug 7, 2020
As identified by TSan when adding additional ExecutorPool tests, the two flags isLowPrioQSet & isHiPrioQSet are accessed without locks from different threads: WARNING: ThreadSanitizer: data race (pid=25602) Write of size 1 at 0x7b5000005ad8 by main thread (mutexes: write M1128287290183932680): #0 CB3ExecutorPool::_registerTaskable(Taskable&) cb3_executorpool.cc:442 (ep-engine_ep_unit_tests+0x00000048891c) #1 CB3ExecutorPool::registerTaskable(Taskable&) cb3_executorpool.cc:454 (ep-engine_ep_unit_tests+0x000000488a16) #2 ExecutorPoolTest_cancel_can_schedule_Test<TestExecutorPool>::TestBody() executorpool_test.cc:530 (ep-engine_ep_unit_tests+0x0000012e207c) ... Previous read of size 1 at 0x7b5000005ad8 by thread T38: #0 CB3ExecutorPool::getSleepQ(unsigned int) cb3_executorpool.h:112 (ep-engine_ep_unit_tests+0x000000485421) #1 CB3ExecutorPool::_nextTask(CB3ExecutorThread&, unsigned char) cb3_executorpool.cc:148 (ep-engine_ep_unit_tests+0x000000485421) #2 CB3ExecutorPool::nextTask(CB3ExecutorThread&, unsigned char) cb3_executorpool.cc:163 (ep-engine_ep_unit_tests+0x0000004854e7) #3 CB3ExecutorThread::run() cb3_executorthread.cc:129 (ep-engine_ep_unit_tests+0x00000049cf7d) #4 launch_executor_thread cb3_executorthread.cc:34 (ep-engine_ep_unit_tests+0x00000049d0ae) #5 CouchbaseThread::run() ../platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000010e1b) #6 platform_thread_wrap ../platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000010e1b) This appears to be a latent issue in the CB3ExecutorPool, only exposed when different priority Taskables (Buckets) are registered. Fix by making the flags atomic. Change-Id: Ibf7fda1443bad32a36914a6d1bc7b3424a3bfe75 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/133889 Tested-by: Build Bot <[email protected]> Reviewed-by: James Harrison <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Sep 4, 2020
The following data race was observed on Stream member variables: Running [0056/0099]: test full rollback on consumer...================== WARNING: ThreadSanitizer: data race (pid=43827) Write of size 8 at 0x7b540001ef60 by thread T23: #0 PassiveStream::reconnectStream(...) ../kv_engine/engines/ep/src/dcp/passive_stream.cc:240 (libep.so+0x00000012eed6) #1 DcpConsumer::doRollback(unsigned int, Vbid, unsigned long) ../kv_engine/engines/ep/src/dcp/consumer.cc:1107 (libep.so+0x0000001116d3) #2 RollbackTask::run() ../kv_engine/engines/ep/src/dcp/consumer.cc:938 (libep.so+0x000000111871) #3 GlobalTask::execute() ../kv_engine/engines/ep/src/globaltask.cc:73 (libep.so+0x0000002363cb) #4 CB3ExecutorThread::run() ../kv_engine/engines/ep/src/cb3_executorthread.cc:174 (libep.so+0x00000009f7a1) #5 launch_executor_thread ../kv_engine/engines/ep/src/cb3_executorthread.cc:34 (libep.so+0x00000009fded) #6 CouchbaseThread::run() ../platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000010e1b) #7 platform_thread_wrap ../platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000010e1b) #8 <null> <null> (libtsan.so.0+0x000000024feb) Previous read of size 8 at 0x7b540001ef60 by thread T22 (mutexes: write M639646524455782464): #0 void StatCollector::addStat<...>(...) ../kv_engine/include/statistics/collector.h:343 (libep.so+0x0000000e5e03) #1 void add_casted_stat<>(...) ../kv_engine/include/statistics/collector.h:404 (libep.so+0x0000000e5e03) #2 Stream::addStats(...) ../kv_engine/engines/ep/src/dcp/stream.cc:157 (libep.so+0x00000016b1b4) #3 PassiveStream::addStats(...) ../kv_engine/engines/ep/src/dcp/passive_stream.cc:1058 (libep.so+0x000000133a98) #4 DcpConsumer::addStats(...) ../kv_engine/engines/ep/src/dcp/consumer.cc:1140 (libep.so+0x000000115415) #5 ConnStatBuilder::operator()(std::shared_ptr<ConnHandler>) ../kv_engine/engines/ep/src/ep_engine.cc:3784 (libep.so+0x0000001f3f9a) #6 void DcpConnMap::each<ConnStatBuilder&>(ConnStatBuilder&) ../kv_engine/engines/ep/src/dcp/dcpconnmap_impl.h:33 (libep.so+0x0000001f3f9a) #7 EventuallyPersistentEngine::doDcpStats(...) ../kv_engine/engines/ep/src/ep_engine.cc:3943 (libep.so+0x0000001d9cd0) #8 EventuallyPersistentEngine::getStats(...) ../kv_engine/engines/ep/src/ep_engine.cc:4675 (libep.so+0x0000001dd67b) #9 KVBucket::snapshotStats() ../kv_engine/engines/ep/src/kv_bucket.cc:1168 (libep.so+0x000000261e32) #10 StatSnap::run() ../kv_engine/engines/ep/src/tasks.cc:72 (libep.so+0x0000002b76a9) #11 GlobalTask::execute() ../kv_engine/engines/ep/src/globaltask.cc:73 (libep.so+0x0000002363cb) #12 CB3ExecutorThread::run() ../kv_engine/engines/ep/src/cb3_executorthread.cc:174 (libep.so+0x00000009f7a1) #13 launch_executor_thread ../kv_engine/engines/ep/src/cb3_executorthread.cc:34 (libep.so+0x00000009fded) #14 CouchbaseThread::run() ../platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000010e1b) #15 platform_thread_wrap ../platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000010e1b) #16 <null> <null> (libtsan.so.0+0x000000024feb) Fix by acquiring the stream mutex before accessing these. Change-Id: Ie14eb02e8e98c0aa5c9432c6f385766a215eca8f Reviewed-on: http://review.couchbase.org/c/kv_engine/+/135531 Tested-by: Build Bot <[email protected]> Reviewed-by: James Harrison <[email protected]> Reviewed-by: Paolo Cocchi <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 6, 2020
Move ThreadGate so it has a scope equal to the thread ==16055==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffceabf4548 at pc 0x00000520d834 bp 0x7fb4f0458520 sp 0x7fb4f0458518 READ of size 8 at 0x7ffceabf4548 thread T3 #0 0x520d833 in ThreadGate::isComplete(std::unique_lock<std::mutex> const&) ../kv_engine/engines/ep/tests/module_tests/thread_gate.h:69 #1 0x520d833 in ThreadGate::threadUp()::{lambda()#1}::operator()() const ../kv_engine/engines/ep/tests/module_tests/thread_gate.h:45 #2 0x520d833 in void std::condition_variable::wait<ThreadGate::threadUp()::{lambda()#1}>(std::unique_lock<std::mutex>&, ThreadGate::threadUp()::{lambda()#1}) /usr/local/include/c++/7.3.0/condition_variable:98 #3 0x520d833 in ThreadGate::threadUp() ../kv_engine/engines/ep/tests/module_tests/thread_gate.h:45 #4 0x520d833 in resetThread(HdrHistogram&, ThreadGate&) ../kv_engine/engines/ep/tests/module_tests/hdrhistogram_test.cc:482 #5 0x5228687 in void std::__invoke_impl<void, void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&>(std::__invoke_other, void (*&&)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&) /usr/local/include/c++/7.3.0/bits/invoke.h:60 #6 0x5228687 in std::__invoke_result<void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&>::type std::__invoke<void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&>(void (*&&)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&) /usr/local/include/c++/7.3.0/bits/invoke.h:95 #7 0x5228687 in decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)(), (_S_declval<2ul>)())) std::thread::_Invoker<std::tuple<void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/local/include/c++/7.3.0/thread:234 #8 0x5228687 in std::thread::_Invoker<std::tuple<void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&> >::operator()() /usr/local/include/c++/7.3.0/thread:243 #9 0x5228687 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&> > >::_M_run() /usr/local/include/c++/7.3.0/thread:186 #10 0x7fb4f66b295e in execute_native_thread_routine /tmp/deploy/objdir/../gcc-7.3.0/libstdc++-v3/src/c++11/thread.cc:83 #11 0x7fb4f51a76b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) #12 0x7fb4f4edd41c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c) Change-Id: Idb2dbb73d1a2e0b1beca570b38401009d3408906 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/137540 Reviewed-by: Richard de Mellow <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 7, 2020
Fix a TSAN failure due to a lock order inversion observed on the mad-hatter TSAN cv machine. This was due to a potential deadlock while acquiring locks on this and other histogram in HdrHistogram::operator+=(). To fix this use folly::acquireLocked() to safely take write lock and read lock on this->histogram and other.histogram respectively. We also remove the nullptr for this->histogram as we don't guard any other access to the histograms pointer, in case cb_calloc() fails. TSAN failure: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=11208) Cycle in lock order graph: M296533874179548048 (0x000000000000) => M174936684240544688 (0x000000000000) => M296533874179548048 Mutex M174936684240544688 acquired here while holding mutex M296533874179548048 in main thread: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005b63d) #1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.cpp:99 (ep-engine_ep_unit_tests+0x00000139555e) #2 annotate_rwlock_acquired tlm/deps/folly.exploded/include/folly/synchronization/SanitizeThread.h:111 (ep-engine_ep_unit_tests+0x000001362a06) #3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) tlm/deps/folly.exploded/include/folly/SharedMutex.h:726 (ep-engine_ep_unit_tests+0x000001362a06) #4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared() tlm/deps/folly.exploded/include/folly/SharedMutex.h:400 (ep-engine_ep_unit_tests+0x000001362a06) #5 folly::detail::LockTraitsImpl<folly::SharedMutexImpl<false, void, std::atomic, false, true>, (folly::detail::MutexLevel)1, false>::lock_shared(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) tlm/deps/folly.exploded/include/folly/LockTraits.h:157 (ep-engine_ep_unit_tests+0x000001362a06) #6 std::integral_constant<bool, true> folly::LockPolicyShared::lock<folly::SharedMutexImpl<false, void, std::atomic, false, true> >(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) tlm/deps/folly.exploded/include/folly/LockTraits.h:499 (ep-engine_ep_unit_tests+0x000001362a06) #7 folly::LockedPtrBase<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const, folly::SharedMutexImpl<false, void, std::atomic, false, true>, folly::LockPolicyShared>::LockedPtrBase(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const*) tlm/deps/folly.exploded/include/folly/Synchronized.h:1089 (ep-engine_ep_unit_tests+0x000001362a06) #8 folly::LockedPtr<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const, folly::LockPolicyShared>::LockedPtr(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const*) tlm/deps/folly.exploded/include/folly/Synchronized.h:1394 (ep-engine_ep_unit_tests+0x000001360063) #9 folly::SynchronizedBase<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, (folly::detail::MutexLevel)1>::rlock() tlm/deps/folly.exploded/include/folly/Synchronized.h:140 (ep-engine_ep_unit_tests+0x000001360063) #10 HdrHistogram::operator+=(HdrHistogram const&) ../kv_engine/utilities/hdrhistogram.cc:83 (ep-engine_ep_unit_tests+0x000001360063) #11 HdrHistogramTest_aggregationTest_Test::TestBody() ../kv_engine/engines/ep/tests/module_tests/hdrhistogram_test.cc:317 (ep-engine_ep_unit_tests+0x00000106945c) #12 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../googletest/googletest/src/gtest.cc:2402 (ep-engine_ep_unit_tests+0x0000013276cb) #13 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../googletest/googletest/src/gtest.cc:2438 (ep-engine_ep_unit_tests+0x00000132e814) #14 testing::Test::Run() ../googletest/googletest/src/gtest.cc:2474 (ep-engine_ep_unit_tests+0x00000131d649) #15 testing::TestInfo::Run() ../googletest/googletest/src/gtest.cc:2656 (ep-engine_ep_unit_tests+0x00000131d91d) #16 testing::TestCase::Run() ../googletest/googletest/src/gtest.cc:2774 (ep-engine_ep_unit_tests+0x00000131dac5) #17 testing::internal::UnitTestImpl::RunAllTests() ../googletest/googletest/src/gtest.cc:4649 (ep-engine_ep_unit_tests+0x00000131fe3e) #18 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../googletest/googletest/src/gtest.cc:2402 (ep-engine_ep_unit_tests+0x000001327972) #19 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../googletest/googletest/src/gtest.cc:2438 (ep-engine_ep_unit_tests+0x00000132edc8) #20 testing::UnitTest::Run() ../googletest/googletest/src/gtest.cc:4257 (ep-engine_ep_unit_tests+0x00000131d741) #21 RUN_ALL_TESTS() ../googletest/googletest/include/gtest/gtest.h:2237 (ep-engine_ep_unit_tests+0x000000d30e6a) #22 main ../kv_engine/engines/ep/tests/module_tests/ep_unit_tests_main.cc:144 (ep-engine_ep_unit_tests+0x000000d30e6a) Change-Id: I7f7448627d20c753add8c92eaf1186fb350aaab0 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/137548 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 8, 2020
Fix TSAN failures in module_tests/hdrhistogram_test.cc, due to theoretical deadlock that could occur when using two iterators that hold read locks on different HdrHistograms (this isn't the case now bug could be if the code was modified). To avoid this WARNING, ensure in our HdrHistogramTests that we never hold multiple HdrHistogram::Iterators in the same scope. Example TSAN Failure: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=16448) Cycle in lock order graph: M732257135894671160 (0x000000000000) => M730849761011117904 (0x000000000000) => M732257135894671160 Mutex M730849761011117904 acquired here while holding mutex M732257135894671160 in main thread: #0 AnnotateRWLockAcquired <null> (ep-engine_ep_unit_tests+0x64007b) #1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.cpp:99 (ep-engine_ep_unit_tests+0x162f6be) #2 annotate_rwlock_acquired /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.h:111 (ep-engine_ep_unit_tests+0x15a9881) #3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/SharedMutex.h:740 (ep-engine_ep_unit_tests+0x15a9881) #4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock() /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/SharedMutex.h:372 (ep-engine_ep_unit_tests+0x15a9881) #5 folly::detail::LockTraitsImpl<folly::SharedMutexImpl<false, void, std::atomic, false, true>, (folly::detail::MutexLevel)0, false>::lock(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/LockTraits.h:124:11 (ep-engine_ep_unit_tests+0x717b75) #6 std::integral_constant<bool, true> folly::LockPolicyExclusive::lock<folly::SharedMutexImpl<false, void, std::atomic, false, true> >(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/LockTraits.h:479:5 (ep-engine_ep_unit_tests+0x717b45) #7 folly::LockedPtrBase<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, folly::SharedMutexImpl<false, void, std::atomic, false, true>, folly::LockPolicyExclusive>::LockedPtrBase(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1097:10 (ep-engine_ep_unit_tests+0x158cb02) #8 folly::LockedPtr<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, folly::LockPolicyExclusive>::LockedPtr(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1402:50 (ep-engine_ep_unit_tests+0x158caae) #9 folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >::contextualLock() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:653:12 (ep-engine_ep_unit_tests+0x158ceae) #10 std::tuple<std::conditional<std::is_const<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > >::value, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >::ConstLockedPtr, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >::LockedPtr>::type, std::conditional<std::is_const<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const>::value, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const::ConstLockedPtr, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const::LockedPtr>::type> folly::acquireLocked<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const>(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >&, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1741:18 (ep-engine_ep_unit_tests+0x158ccda) #11 std::pair<std::conditional<std::is_const<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > >::value, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >::ConstLockedPtr, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >::LockedPtr>::type, std::conditional<std::is_const<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const>::value, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const::ConstLockedPtr, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const::LockedPtr>::type> folly::acquireLockedPair<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const>(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >&, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1753:21 (ep-engine_ep_unit_tests+0x158bbf9) #12 HdrHistogram::operator+=(HdrHistogram const&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/utilities/hdrhistogram.cc:67:21 (ep-engine_ep_unit_tests+0x158a3bf) #13 HdrHistogramTest_aggregationTest_Test::TestBody() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/engines/ep/tests/module_tests/hdrhistogram_test.cc:317:18 (ep-engine_ep_unit_tests+0x13479b1) #14 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2433:10 (ep-engine_ep_unit_tests+0x1567683) #15 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2469:14 (ep-engine_ep_unit_tests+0x1553012) #16 testing::Test::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2508:5 (ep-engine_ep_unit_tests+0x153b87c) #17 testing::TestInfo::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2684:11 (ep-engine_ep_unit_tests+0x153c26a) #18 testing::TestSuite::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2816:28 (ep-engine_ep_unit_tests+0x153c841) #19 testing::internal::UnitTestImpl::RunAllTests() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:5338:44 (ep-engine_ep_unit_tests+0x1545a21) #20 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2433:10 (ep-engine_ep_unit_tests+0x156aa43) #21 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2469:14 (ep-engine_ep_unit_tests+0x15551e2) #22 testing::UnitTest::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:4925:10 (ep-engine_ep_unit_tests+0x15455d7) #23 RUN_ALL_TESTS() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/include/gtest/gtest.h:2473:46 (ep-engine_ep_unit_tests+0x1079927) #24 main /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/engines/ep/tests/module_tests/ep_unit_tests_main.cc:175:16 (ep-engine_ep_unit_tests+0x107978e) Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message Mutex M732257135894671160 acquired here while holding mutex M730849761011117904 in main thread: #0 AnnotateRWLockAcquired <null> (ep-engine_ep_unit_tests+0x64007b) #1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.cpp:99 (ep-engine_ep_unit_tests+0x162f6be) #2 annotate_rwlock_acquired /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.h:111 (ep-engine_ep_unit_tests+0x15ad433) #3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/SharedMutex.h:740 (ep-engine_ep_unit_tests+0x15ad433) #4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared() /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/SharedMutex.h:414 (ep-engine_ep_unit_tests+0x15ad433) #5 folly::detail::LockTraitsImpl<folly::SharedMutexImpl<false, void, std::atomic, false, true>, (folly::detail::MutexLevel)1, false>::lock_shared(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/LockTraits.h:157:11 (ep-engine_ep_unit_tests+0x703215) #6 std::integral_constant<bool, true> folly::LockPolicyShared::lock<folly::SharedMutexImpl<false, void, std::atomic, false, true> >(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/LockTraits.h:499:5 (ep-engine_ep_unit_tests+0x7031c5) #7 folly::LockedPtrBase<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const, folly::SharedMutexImpl<false, void, std::atomic, false, true>, folly::LockPolicyShared>::LockedPtrBase(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1097:10 (ep-engine_ep_unit_tests+0x915242) #8 folly::LockedPtr<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const, folly::LockPolicyShared>::LockedPtr(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1402:50 (ep-engine_ep_unit_tests+0x9151ee) #9 folly::SynchronizedBase<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, (folly::detail::MutexLevel)1>::rlock() const /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:144:12 (ep-engine_ep_unit_tests+0x91502e) #10 HdrHistogram::Iterator::Iterator(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const&, HdrHistogram::Iterator::IterMode) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/utilities/hdrhistogram.h:89:66 (ep-engine_ep_unit_tests+0x158be71) #11 HdrHistogram::makeLinearIterator(long) const /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/utilities/hdrhistogram.cc:162:28 (ep-engine_ep_unit_tests+0x158aafb) #12 HdrHistogramTest_aggregationTest_Test::TestBody() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/engines/ep/tests/module_tests/hdrhistogram_test.cc:322:26 (ep-engine_ep_unit_tests+0x13479e2) #13 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2433:10 (ep-engine_ep_unit_tests+0x1567683) #14 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2469:14 (ep-engine_ep_unit_tests+0x1553012) #15 testing::Test::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2508:5 (ep-engine_ep_unit_tests+0x153b87c) #16 testing::TestInfo::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2684:11 (ep-engine_ep_unit_tests+0x153c26a) #17 testing::TestSuite::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2816:28 (ep-engine_ep_unit_tests+0x153c841) #18 testing::internal::UnitTestImpl::RunAllTests() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:5338:44 (ep-engine_ep_unit_tests+0x1545a21) #19 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2433:10 (ep-engine_ep_unit_tests+0x156aa43) #20 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2469:14 (ep-engine_ep_unit_tests+0x15551e2) #21 testing::UnitTest::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:4925:10 (ep-engine_ep_unit_tests+0x15455d7) #22 RUN_ALL_TESTS() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/include/gtest/gtest.h:2473:46 (ep-engine_ep_unit_tests+0x1079927) #23 main /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/engines/ep/tests/module_tests/ep_unit_tests_main.cc:175:16 (ep-engine_ep_unit_tests+0x107978e) Change-Id: I7c3e9369065e5344333c410602267835f9bcc7e1 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/137745 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 12, 2020
As seen in TSan testing with FollyExecutorPool when running ep_perfsuite: Running [0013/0017]: Stat latency with 100 vbuckets. Also sets & DCP traffic on separate thread...================== WARNING: ThreadSanitizer: data race (pid=16435) Read of size 8 at 0x7b24000053f8 by main thread (mutexes: write M386882167168307840): #0 folly::HHWheelTimerBase<>::Callback::isScheduled() const follytsan/folly/io/async/HHWheelTimer.h:121:14 (libep.so+0x5c63b6) #1 FollyExecutorPool::doTaskQStat(...) kv_engine/engines/ep/src/folly_executorpool.cc:839:30 (libep.so+0x34ff85) #2 EventuallyPersistentEngine::doWorkloadStats(...) kv_engine/engines/ep/src/ep_engine.cc:4115:17 (libep.so+0x2e67d6) #3 EventuallyPersistentEngine::getStats(...) kv_engine/engines/ep/src/ep_engine.cc:4670:16 (libep.so+0x2d689a) ... Previous write of size 8 at 0x7b24000053f8 by thread T1: #0 memset <null> (ep_perfsuite+0x4b3a42) #1 folly::HHWheelTimerBase<>::Callback::cancelTimeoutImpl() follytsan/folly/io/async/HHWheelTimer.cpp:86:15 (libep.so+0x5c6507) #2 folly::HHWheelTimerBase<>::Callback::cancelTimeout() follytsan/folly/io/async/HHWheelTimer.h:114:7 (libep.so+0x5c6507) #3 FollyExecutorPool::TaskProxy::wake() kv_engine/engines/ep/src/folly_executorpool.cc:239:9 (libep.so+0x35e3e9) #4 FollyExecutorPool::State::wakeTask(unsigned long) kv_engine/engines/ep/src/folly_executorpool.cc:352:29 (libep.so+0x3625cf) #5 FollyExecutorPool::wake(unsigned long)::$_9::operator()() const kv_engine/engines/ep/src/folly_executorpool.cc:731:32 (libep.so+0x3515d1) ... Issue is how FollyExecutorPool::doTaskQStat() is implemented. It takes a copy of the taskOwners map on the eventBase thread to attempt to avoid races; however this is a shallow copy - the actual TaskProxy objects which the map references via shared_ptr are not copied. As such, when the TaskProxy objects are read by the non-eventBase thread a race is seen. Solution is to instead of taking a (shallow) copy and then manipulating the copy, just move the entire work onto the eventBase thread. Change-Id: I19de6911944370e836c9905e99c860ee3d5ccb0b Reviewed-on: http://review.couchbase.org/c/kv_engine/+/137406 Reviewed-by: James Harrison <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 16, 2020
When running memcached_testapp under TSan (macOS), an exception is thrown attempting to lock an invalid mutex: libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument With the following backtrace: (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x00007fff6585e0f8 libc++abi.dylib` __cxa_throw frame #1: 0x00007fff6583855b libc++.1.dylib` std::__1::__throw_system_error(int, char const*) + 77 frame #2: 0x00007fff6582f54d libc++.1.dylib` std::__1::mutex::lock() + 29 frame #3: 0x00000001014139f1 memcached_testapp` std::__1::unique_lock<std::__1::mutex>::unique_lock(this=0x00007ffeefbff3e8, __m=<unavailable>) + 65 at __mutex_base:130 frame #4: 0x0000000101410811 memcached_testapp` std::__1::unique_lock<std::__1::mutex>::unique_lock(this=<unavailable>, __m=<unavailable>) + 33 at __mutex_base:130 frame #5: 0x000000010141053e memcached_testapp` folly::shared_mutex_detail::annotationGuard(ptr=<unavailable>) + 110 at SharedMutex.cpp:33 frame #6: 0x0000000101410445 memcached_testapp` folly::SharedMutexImpl<...>::annotateLazyCreate(this=0x0000000101984640) + 53 at SharedMutex.h:719 frame #7: 0x000000010140f6ca memcached_testapp` folly::SharedMutexImpl<...>::annotateDestroy(this=0x0000000101984640) + 26 at SharedMutex.h:732 frame #8: 0x000000010140f5af memcached_testapp` folly::SharedMutexImpl<...>::~SharedMutexImpl(this=0x0000000101984640) + 63 at SharedMutex.h:338 frame #9: 0x000000010140f71a memcached_testapp` folly::SharedMutexImpl<...>::~SharedMutexImpl(this=<unavailable>) + 26 at SharedMutex.h:312 frame #10: 0x0000000100d5549a memcached_testapp` folly::Synchronized<std::__1::unique_ptr<cb::audit::Audit, ...>, folly::SharedMutexImpl<...> >::~Synchronized(this=0x0000000101984638) + 58 at Synchronized.h:489 frame #11: 0x0000000100d511a9 memcached_testapp` folly::Synchronized<std::__1::unique_ptr<cb::audit::Audit, ...>, folly::SharedMutexImpl<...> >::~Synchronized(this=0x0000000101984638) + 41 at Synchronized.h:489 frame #12: 0x000000010ad4c721 libclang_rt.tsan_osx_dynamic.dylib` cxa_at_exit_wrapper(void*) + 33 frame #13: 0x00007fff685d813c libsystem_c.dylib` __cxa_finalize_ranges + 319 frame #14: 0x00007fff685d8412 libsystem_c.dylib` exit + 55 frame #15: 0x00007fff6852ecd0 libdyld.dylib` start + 8 The crash is caused when destructing the static variable `auditHandle`, the Synchronized dtor attempts to access another static variable (kAnnotationMutexes) which has already been destructed. The problem is that auditHandle is a static at file scope, so the order it it initiailised (and destructed) relative to other statics is undefined. Switch to using a C++11 "magic static" via a getAuditHandle() function, which defers construction until the type is used. The same problem exists for `connections`, fix in the same way. Change-Id: I58c925f2c71907ab22581bf462ce3c7d092cfefa Reviewed-on: http://review.couchbase.org/c/kv_engine/+/138051 Tested-by: Build Bot <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 23, 2020
+Issue+ When enabling FollyExecutorPool by default, TSan reports the following race when running ./ep-engine_ep_unit_tests "--gtest_filter=DurabilityRespondAmbiguousTest.*": AuxIO thread: Previous atomic write of size 8 at 0x7b74000020a0 by thread T8: #0 __tsan_atomic64_fetch_sub <null> (libtsan.so.0+0x000000060890) ... #4 ~HashTable kv_engine/engines/ep/src/hash_table.cc:161 (ep-engine_ep_unit_tests+0x00000122cae1) #5 ~VBucket kv_engine/engines/ep/src/vbucket.cc:286 (ep-engine_ep_unit_tests+0x0000012b3af4) #6 ~EPVBucket kv_engine/engines/ep/src/ep_vb.cc:101 (ep-engine_ep_unit_tests+0x0000011af5e1) ... #10 ~VBucketMemoryDeletionTask kv_engine/engines/ep/src/vbucketdeletiontask.cc:45 (ep-engine_ep_unit_tests+0x0000012e4530) ... #17 std::__shared_ptr<GlobalTask>::reset() /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:1235 (ep-engine_ep_unit_tests+0x000001221e75) #18 FollyExecutorPool::TaskProxy::~TaskProxy()::{lambda()#1}::operator()() kv_engine/engines/ep/src/folly_executorpool.cc:80 (ep-engine_ep_unit_tests+0x000001221e75) Main thread: Write of size 8 at 0x7b74000020a0 by main thread: #0 free <null> (libtsan.so.0+0x000000027806) ... #6 CoreStore<...>::~CoreStore() platform/include/platform/corestore.h:50 (ep-engine_ep_unit_tests+0x0000012988b1) #7 ~EPStats kv_engine/engines/ep/src/stats.cc:132 (ep-engine_ep_unit_tests+0x0000012988b1) #8 ~EventuallyPersistentEngine kv_engine/engines/ep/src/ep_engine.cc:6593 (ep-engine_ep_unit_tests+0x0000011e3bb5) ... #12 DurabilityRespondAmbiguousTest_RespondAmbiguousNotificationDeadLock_Test::TestBody() kv_engine/engines/ep/tests/module_tests/evp_store_durability_test.cc:2350 (ep-engine_ep_unit_tests+0x000000bd3642) The crux of this issue seems to be that a background AuxIO task run via the FollyExecutorPool is deleting a VBucket object concurrently with the main thread deleting an EPStats object. +Background+ Details of how CB3ExecutorPool and FollyExecutorPool implement {{unregisterTaskable()}}, which I believe is what leads to this problem. CB3ExecutorPool: During CB3ExecutorPool::unregisterTaskable(): 1. CB3ExecutorPool::_stopTaskGroup() is called and will wait for VBucketMemoryDeletionTask to run. 2. When VBucketMemoryDeletionTask::run() is call it returns false. 3. CB3ExecutorThread will then synchronously call CB3ExecutorThread::cancelCurrentTask() -> CB3ExecutorPool::cancel(). That removes all Cb3ExecutorPool-owned references to task, and hence will run VBucketMemoryDeletionTask dtor. 4. VBucketMemoryDeletionTask dtor frees the VBucket object. As such, by the time CB3ExecutorPool::unregisterTaskable() returns the VBucket is *guaranteed* to have been freed. FollyExecutorPool: During FollyExecutorPool::unregisterTaskable(): 1. All tasks scheduled to run in future (owned by IO thread EventBase) are either cancelled (if allowed), or woken to run asap on CPU pool. 2. All tasks waiting to, or currently running on CPU pool are waited for by polling for taskOwners to no longer contain any tasks for the given taskable. 3. (On the CPU threads) Each queued task is run, on completion rescheduleTaskAfterRun is called to add work to the IO thread EventBase to decide when to reschedule, or (in this case) to actualy cancel the task. 4. (On the IO thread) FollyExecutorPool::rescheduleTaskAfterRun is called, for cancelled tasks this calls State::cancelTask() which removes the task from taskOwners - at which point TaskProxy dtor runs, which schedules _another_ task on CPU pool to actually delete the GlobalTask. The problem here is that the TaskProxy is removed from taskOwners and deletes at (4) on the IO thread; however the GlobalTask destruction is deferred to later execution on a CPU thread. As such, FollyExecutorPool::unregisterTaskable() can see taskOwners having no tasks left for the taskable (and hence return) _before_ the VBucket object is deleted. Note the deferred deletion at (5) was added to avoid potentially large amounts of work being done on the IO thread - we aim to minimise work done here as it can impact the scheduling of other tasks. +Solution+ If the TaskProxy removal from taskOwners is deferred until _after_ the GlobalTask shared ownership is released, then unregisterTaskable() will no longer return until all GlobalTask references inside FollyExecutorPool have been released. To achieve this changes the ownership model in FollyExecutorPool are needed: 1. Don't immediately remove TaskProxy from taskOwners in cancelTask(). Instead: a) Mark it as cancelled, by setting the GlobalTask ptr to null, b) Schedule an asynchronous task to release its GlobalTask shared_ptr. 2. This new async task (setup in resetTaskPtrViaCpuPool) releases the TaskProxy's shared ownership on GlobalTask, then schedules an (IO thread) completion task to finally remove the TaskProxy from taskOwners. unregisterTaskable() is unchanged - it still waits for the taskOwner map for the given Taskable become empty; however given the above changes that only happens once the GlobalTask reference has been released. Change-Id: Iecbff9f3b45fc9e3d385c67f6a6dd32242dc76fe Reviewed-on: http://review.couchbase.org/c/kv_engine/+/138373 Tested-by: Build Bot <[email protected]> Reviewed-by: Jim Walker <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jan 19, 2021
The list write lock must be held when updating the pausedPurgePoint to avoid racing with updateListElem reading it. WARNING: ThreadSanitizer: data race (pid=17198) Read of size 8 at 0x7b4c00002d40 by thread T2 (mutexes: read M666668208119350224, write M838931151564315824, write M830768325700119856, write M825420198063385568): #0 boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false>::pointed_node() const <null> (libep.so+0x0000003c5dd5) #1 boost::intrusive::operator==(boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&, boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&) tlm/deps/boost.exploded/include/boost/intrusive/detail/list_iterator.hpp:119 (libep.so+0x0000003c230b) #2 BasicLinkedList::updateListElem(std::lock_guard<std::mutex>&, std::lock_guard<std::mutex>&, OrderedStoredValue&) ../kv_engine/engines/ep/src/linked_list.cc:82 (libep.so+0x0000003bf176) #3 EphemeralVBucket::modifySeqList(std::lock_guard<std::mutex>&, std::lock_guard<std::mutex>&, OrderedStoredValue&) ../kv_engine/engines/ep/src/ephemeral_vb.cc:932 (libep.so+0x000000327553) ... Previous write of size 8 at 0x7b4c00002d40 by thread T30: #0 boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false>::operator=(boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&) tlm/deps/boost.exploded/include/boost/intrusive/detail/list_iterator.hpp:79 (libep.so+0x0000003c2436) #1 BasicLinkedList::purgeTombstones(long, std::function<bool (DocKey const&, long)>, std::function<bool ()>) ../kv_engine/engines/ep/src/linked_list.cc:360 (libep.so+0x0000003c03da) #2 EphemeralVBucket::purgeStaleItems(std::function<bool ()>) ../kv_engine/engines/ep/src/ephemeral_vb.cc:391 (libep.so+0x000000326f56) #3 EphemeralVBucket::StaleItemDeleter::visit(VBucket&) ../kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:211 (libep.so+0x000000324a5e) #4 KVBucket::pauseResumeVisit(PauseResumeVBVisitor&, KVBucketIface::Position&) ../kv_engine/engines/ep/src/kv_bucket.cc:2322 (libep.so+0x0000003831e8) #5 EphTombstoneStaleItemDeleter::run() ../kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:280 (libep.so+0x0000003211dc) Change-Id: I5fd6d4bfeef5831420f9f6a472ba7bc817753c23 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/143784 Tested-by: Build Bot <[email protected]> Reviewed-by: Dave Rigby <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jan 28, 2021
Backport of 56490d4 http://review.couchbase.org/c/kv_engine/+/143784 The list write lock must be held when updating the pausedPurgePoint to avoid racing with updateListElem reading it. WARNING: ThreadSanitizer: data race (pid=17198) Read of size 8 at 0x7b4c00002d40 by thread T2 (mutexes: read M666668208119350224, write M838931151564315824, write M830768325700119856, write M825420198063385568): #0 boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false>::pointed_node() const <null> (libep.so+0x0000003c5dd5) #1 boost::intrusive::operator==(boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&, boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&) tlm/deps/boost.exploded/include/boost/intrusive/detail/list_iterator.hpp:119 (libep.so+0x0000003c230b) #2 BasicLinkedList::updateListElem(std::lock_guard<std::mutex>&, std::lock_guard<std::mutex>&, OrderedStoredValue&) ../kv_engine/engines/ep/src/linked_list.cc:82 (libep.so+0x0000003bf176) #3 EphemeralVBucket::modifySeqList(std::lock_guard<std::mutex>&, std::lock_guard<std::mutex>&, OrderedStoredValue&) ../kv_engine/engines/ep/src/ephemeral_vb.cc:932 (libep.so+0x000000327553) ... Previous write of size 8 at 0x7b4c00002d40 by thread T30: #0 boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false>::operator=(boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&) tlm/deps/boost.exploded/include/boost/intrusive/detail/list_iterator.hpp:79 (libep.so+0x0000003c2436) #1 BasicLinkedList::purgeTombstones(long, std::function<bool (DocKey const&, long)>, std::function<bool ()>) ../kv_engine/engines/ep/src/linked_list.cc:360 (libep.so+0x0000003c03da) #2 EphemeralVBucket::purgeStaleItems(std::function<bool ()>) ../kv_engine/engines/ep/src/ephemeral_vb.cc:391 (libep.so+0x000000326f56) #3 EphemeralVBucket::StaleItemDeleter::visit(VBucket&) ../kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:211 (libep.so+0x000000324a5e) #4 KVBucket::pauseResumeVisit(PauseResumeVBVisitor&, KVBucketIface::Position&) ../kv_engine/engines/ep/src/kv_bucket.cc:2322 (libep.so+0x0000003831e8) #5 EphTombstoneStaleItemDeleter::run() ../kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:280 (libep.so+0x0000003211dc) Change-Id: I5fd6d4bfeef5831420f9f6a472ba7bc817753c23 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/143877 Tested-by: Build Bot <[email protected]> Reviewed-by: Dave Rigby <[email protected]> Well-Formed: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Apr 29, 2021
When converting platform to be statically linked, a crash is seen during shutdown of ep-engine_ep_unit_tests.DcpConnMapTest tests on macOS: (lldb) bt * thread #22, name = 'NonIoPool2', stop reason = signal SIGABRT * frame #0: 0x00007fff693f733a libsystem_kernel.dylib` __pthread_kill + 10 frame #1: 0x00007fff694b3e60 libsystem_pthread.dylib` pthread_kill + 430 frame #2: 0x00007fff6937e808 libsystem_c.dylib` abort + 120 frame #3: 0x00007fff665dd458 libc++abi.dylib` abort_message + 231 frame #4: 0x00007fff665ce8a7 libc++abi.dylib` demangling_terminate_handler() + 238 frame #5: 0x00007fff681095b1 libobjc.A.dylib` _objc_terminate() + 104 frame #6: 0x00007fff665dc887 libc++abi.dylib` std::__terminate(void (*)()) + 8 frame #7: 0x00007fff665df1a2 libc++abi.dylib` __cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27 frame #8: 0x00007fff665df169 libc++abi.dylib` __cxa_throw + 113 frame #9: 0x00007fff665b955b libc++.1.dylib` std::__1::__throw_system_error(int, char const*) + 77 frame #10: 0x00007fff665b054d libc++.1.dylib` std::__1::mutex::lock() + 29 frame #11: 0x000000010a78af00 libphosphor.dylib` phosphor::TraceLog::lock(this=0x000000010a797b30) + 16 at trace_log.h:250 frame #12: 0x000000010a78aedf libphosphor.dylib` std::__1::lock_guard<phosphor::TraceLog>::lock_guard(this=0x000070000b00be98, __m=0x000000010a797b30) + 15 at __mutex_base:91 frame #13: 0x000000010a787c49 libphosphor.dylib` std::__1::lock_guard<phosphor::TraceLog>::lock_guard(this=0x000070000b00be98, __m=0x000000010a797b30) + 9 at __mutex_base:91 frame #14: 0x000000010a788a2c libphosphor.dylib` phosphor::TraceLog::deregisterThread(this=0x000000010a797b30) + 28 at trace_log.cc:222 frame #15: 0x000000010023ec6d ep-engine_ep_unit_tests` CBRegisteredThreadFactory::newThread(this=0x000000010ddc3c00)>&&)::'lambda'()::operator()() + 93 at folly_executorpool.cc:49 The ExecutorPool is shutting down all its background threads, during which each thread calls phosphor::TraceLog::deregisterThread() to remove this thread from the set folly is tracking. However TraceLog is a singleton and it has already been destructed, so accessing it's mutex member variable results in an exception being thrown. This is due to a change in the static initialisation (and deinitialization) order between ExecutorPool and phosphor::TraceLog. Both are Mayer singletons, but phosphor::TraceLog is first accessed (and hence initialised) _after_ ExecutorPool - when the ExecutorPool threads first register their threads. As such, TraceLog will be destroyed before ExecutorPool (destruction is in reverse construction order). Solve by explicilty accessing (and hence initializing) TraceLog before ExecutorPool is created in ep_unit_tests_main.cc. (Note this problem doesn't occur in memcached as we explicilty initialise tracing before any buckets are created.) Change-Id: I1953129cce0d05a42f0790724c470e38b2dd0701 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/152326 Tested-by: Build Bot <[email protected]> Reviewed-by: Paolo Cocchi <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Apr 29, 2021
When converting platform to be statically linked, a crash is seen during shutdown of ep-engine_ep_unit_tests.DcpConnMapTest tests on MSVC. The ExecutorPool is consuming messages on the background threads (I believe to coordinate shutdown), and during that it attempts to log a warning message to Google Log. The cause of the crash is a change in the static initialisation (and deinitialization) order - the GoogleLog singleton instance as used internally by Folly is deinitialized before ExecutorPool singleton. As such, when the ExecutorPool singleton is shutting down, it attempts to log a message to a non-existant GLog instance and a nullptr is deferenced. Fix by changing ExecutorPool singleton to use C++11 magic static (Meyer singleton); which ensures it is destructed earlier, before GLog. Additionally, while the above is sufficient to fix this issue on macOS Catalina, on Mojave this introduces _another_ crash as some Folly hazard pointer singletons appear to already have been destructed and the following crash is seen: * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x00007fff7412f2c6 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff741eabf1 libsystem_pthread.dylib`pthread_kill + 284 frame #2: 0x00007fff740996a6 libsystem_c.dylib`abort + 127 frame #3: 0x00007fff741a8077 libsystem_malloc.dylib`malloc_vreport + 545 frame #4: 0x00007fff741a7e38 libsystem_malloc.dylib`malloc_report + 151 frame #5: 0x00007fff73ff3cf9 libdyld.dylib`_tlv_atexit + 155 frame #6: 0x000000010143cb2d ep-engine_ep_unit_tests`folly::SingletonThreadLocal<folly::hazptr_tc<std::__1::atomic>, folly::hazptr_tc_tls_tag, folly::detail::DefaultMake<folly::hazptr_tc<std::__1::atomic> >, folly::hazptr_tc_tls_tag>::getSlow(cache=0x000000010b5606b8) at SingletonThreadLocal.h:157 [opt] frame #7: 0x0000000101437a19 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) [inlined] folly::SingletonThreadLocal<folly::hazptr_tc<std::__1::atomic>, folly::hazptr_tc_tls_tag, folly::detail::DefaultMake<folly::hazptr_tc<std::__1::atomic> >, folly::hazptr_tc_tls_tag>::get() at SingletonThreadLocal.h:167 [opt] frame #8: 0x0000000101437a08 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) [inlined] folly::hazptr_tc<std::__1::atomic>& folly::hazptr_tc_tls<std::__1::atomic>() at HazptrThrLocal.h:166 [opt] frame #9: 0x0000000101437a08 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) at HazptrHolder.h:64 [opt] frame #10: 0x0000000101437a08 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) [inlined] folly::hazptr_holder<std::__1::atomic>::hazptr_holder(this=<unavailable>, domain=<unavailable>) at HazptrHolder.h:61 [opt] frame #11: 0x0000000101437a08 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) at UnboundedQueue.h:374 [opt] frame #12: 0x00000001014379e7 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) [inlined] folly::UnboundedQueue<folly::CPUThreadPoolExecutor::CPUTask, false, false, false, 6ul, 7ul, std::__1::atomic>::enqueue(this=0x00007ffeefbff770, arg=0x00007ffeefbff690) at UnboundedQueue.h:271 [opt] frame #13: 0x00000001014379e7 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(this=0x000000010ba00f80, item=CPUTask @ 0x00007ffeefbff690) at UnboundedBlockingQueue.h:31 [opt] frame #14: 0x0000000101437bfc ep-engine_ep_unit_tests`folly::BlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::addWithPriority(this=0x000000010ba00f80, item=CPUTask @ 0x00007ffeefbff770, (null)=<unavailable>) at BlockingQueue.h:57 [opt] frame #15: 0x0000000101436b00 ep-engine_ep_unit_tests`folly::CPUThreadPoolExecutor::stopThreads(this=0x000000010bf8de00, n=2) at CPUThreadPoolExecutor.cpp:281 [opt] frame #16: 0x000000010144bae3 ep-engine_ep_unit_tests`folly::ThreadPoolExecutor::stop() [inlined] folly::ThreadPoolExecutor::removeThreads(this=<unavailable>, n=<unavailable>) at ThreadPoolExecutor.cpp:233 [opt] frame #17: 0x000000010144bad0 ep-engine_ep_unit_tests`folly::ThreadPoolExecutor::stop(this=0x000000010bf8de00) at ThreadPoolExecutor.cpp:251 [opt] frame #18: 0x00000001014352d4 ep-engine_ep_unit_tests`folly::CPUThreadPoolExecutor::~CPUThreadPoolExecutor(this=0x000000010bf8de00, vtt=0x00000001019fd6c8) at CPUThreadPoolExecutor.cpp:126 [opt] frame #19: 0x0000000101435465 ep-engine_ep_unit_tests`folly::CPUThreadPoolExecutor::~CPUThreadPoolExecutor() [inlined] folly::CPUThreadPoolExecutor::~CPUThreadPoolExecutor(this=0x000000010bf8de00) at CPUThreadPoolExecutor.cpp:124 [opt] frame #20: 0x0000000101435459 ep-engine_ep_unit_tests`folly::CPUThreadPoolExecutor::~CPUThreadPoolExecutor(this=0x000000010bf8de00) at CPUThreadPoolExecutor.cpp:124 [opt] frame #21: 0x000000010023240a ep-engine_ep_unit_tests`FollyExecutorPool::~FollyExecutorPool(this=0x000000010b7eed40) at folly_executorpool.cc:757 [opt] frame #22: 0x00000001002325ee ep-engine_ep_unit_tests`FollyExecutorPool::~FollyExecutorPool(this=0x000000010b7eed40) at folly_executorpool.cc:751 [opt] frame #23: 0x00007fff7409a3cf libsystem_c.dylib`__cxa_finalize_ranges + 319 frame #24: 0x00007fff7409a6b3 libsystem_c.dylib`exit + 55 Address _this_ with a somewhat belt-and-braces approach - also manually shutdown the ExecutorPool in DcpConnMapTest::TearDown - as is done in other tests. Change-Id: I87f13bc3a7cdf616b52d18502dd724fcf630d3b9 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/152230 Tested-by: Build Bot <[email protected]> Reviewed-by: Richard de Mellow <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jul 1, 2021
Fix race of more than one thread trying to free a MockCookie. This is due to MockServerCookieApi::release() and destroy_mock_cookie() racing with each other. For instance, T1 could dec the ref count then pause. T2 gets run which decs the ref count, reads the ref count as 0 and frees the MockCookie. Then T1 gets run again, it tries to call getRefcount() which causes it to deref an invalid pointer as the MockCookie has been freed. To fix this make the read+write of the ref count atomic. RNING: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) (pid=107468) Write of size 8 at 0x7b4800030a80 by thread T2 (mutexes: write M1121813589057925128): #0 cb::tracing::Traceable::~Traceable() ../kv_engine/include/memcached/tracer.h:132 (ep_testsuite_dcp+0x5aab16) #1 MockCookie::~MockCookie() ../kv_engine/programs/engine_testapp/mock_cookie.cc:22 (ep_testsuite_dcp+0x5a9ddd) #2 MockCookie::~MockCookie() ../kv_engine/programs/engine_testapp/mock_cookie.cc:18 (ep_testsuite_dcp+0x5a9e45) #3 MockServerCookieApi::release(CookieIface const&) ../kv_engine/programs/engine_testapp/mock_server.cc:235 (ep_testsuite_dcp+0x5b86c4) #4 EventuallyPersistentEngine::releaseCookie(CookieIface const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/engines/ep/src/ep_engine.cc:1793 (ep_testsuite_dcp+0x765c89) #5 ConnHandler::releaseReference() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/engines/ep/src/connhandler.cc:343 (ep_testsuite_dcp+0x86cb6d) #6 DcpConnMap::manageConnections() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/engines/ep/src/dcp/dcpconnmap.cc:392 (ep_testsuite_dcp+0x7f415e) ... Previous read of size 8 at 0x7b4800030a80 by main thread: #0 destroy_mock_cookie(CookieIface*) ../kv_engine/programs/engine_testapp/mock_cookie.cc:48 (ep_testsuite_dcp+0x5a9fa6) #1 MockTestHarness::destroy_cookie(CookieIface*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:159 (ep_testsuite_dcp+0x490ae5) #2 test_dcp_producer_stream_req_backfill(EngineIface*) ../kv_engine/engines/ep/tests/ep_testsuite_dcp.cc:2306 (ep_testsuite_dcp+0x4d3e3f) ... Location is heap block of size 336 at 0x7b4800030a80 allocated by main thread: #0 operator new(unsigned long) <null> (libtsan.so.0+0x87c5c) #1 create_mock_cookie(EngineIface*) ../kv_engine/programs/engine_testapp/mock_cookie.cc:32 (ep_testsuite_dcp+0x5a9f0a) #2 MockTestHarness::create_cookie(EngineIface*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:155 (ep_testsuite_dcp+0x490aa5) #3 test_dcp_producer_stream_req_backfill(EngineIface*) ../kv_engine/engines/ep/tests/ep_testsuite_dcp.cc:2287 (ep_testsuite_dcp+0x4d3d31) #4 execute_test(test, char const*, char const*)::$_1::operator()() const /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:445 (ep_testsuite_dcp+0x48f988) #5 test_result std::__invoke_impl<test_result, execute_test(test, char const*, char const*)::$_1&>(std::__invoke_other, execute_test(test, char const*, char const*)::$_1&) /opt/gcc-10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/invoke.h:60 (ep_testsuite_dcp+0x48f90d) #6 std::enable_if<is_invocable_r_v<test_result, execute_test(test, char const*, char const*)::$_1&>, test_result>::type std::__invoke_r<test_result, execute_test(test, char const*, char const*)::$_1&>(execute_test(test, char const*, char const*)::$_1&) /opt/gcc-10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/invoke.h:113 (ep_testsuite_dcp+0x48f89d) #7 std::_Function_handler<test_result (), execute_test(test, char const*, char const*)::$_1>::_M_invoke(std::_Any_data const&) /opt/gcc-10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/std_function.h:291 (ep_testsuite_dcp+0x48f78d) #8 std::function<test_result ()>::operator()() const /opt/gcc-10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/std_function.h:622 (ep_testsuite_dcp+0x4900d8) #9 try_run_test(std::function<test_result ()>) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:288 (ep_testsuite_dcp+0x48d368) #10 execute_test(test, char const*, char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:445 (ep_testsuite_dcp+0x48ea37) #11 main /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:698 (ep_testsuite_dcp+0x48dbfb) Mutex M1121813589057925128 is already destroyed. SUMMARY: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) ../kv_engine/include/memcached/tracer.h:132 in cb::tracing::Traceable::~Traceable() Change-Id: I5cc6959ee9644c8c780b239cd63a6071c10c6c45 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/156420 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 11, 2021
As seen on cluster_run built with TSan, the following race is seen aborting a timed-out SyncWrite: WARNING: ThreadSanitizer: data race (pid=46769) Write of size 8 at 0x7b5400291698 by thread T60 (mutexes: read M542537949550136824, write M360774, read M541974810616805208, write M1133353977207195360): #0 Cookie::setEngineStorage(void*) kv_engine/daemon/cookie.h:432 (memcached+0x66a6e1) #1 EventuallyPersistentEngine::storeEngineSpecific(CookieIface const*, void*) kv_engine/engines/ep/src/ep_engine.cc:1841 (memcached+0x7c4162) #2 operator() kv_engine/engines/ep/src/kv_bucket.cc:2756 (memcached+0xa65268) #3 __invoke_impl<void, KVBucket::makeSyncWriteCompleteCB()::<...> /usr/include/c++/10/bits/invoke.h:60 (memcached+0xa65268) #4 __invoke_r<void, KVBucket::makeSyncWriteCompleteCB()::<...> /usr/include/c++/10/bits/invoke.h:110 (memcached+0xa65268) #5 _M_invoke /usr/include/c++/10/bits/std_function.h:291 (memcached+0xa65268) #6 std::function<void (CookieIface const*, cb::engine_errc)>::operator()(...) const /usr/include/c++/10/bits/std_function.h:622 (memcached+0x9bf920) #7 VBucket::notifyClientOfSyncWriteComplete(CookieIface const*, cb::engine_errc) kv_engine/engines/ep/src/vbucket.cc:1041 (memcached+0x9bf920) ... Previous write of size 8 at 0x7b5400291698 by thread T18 (mutexes: write M3809): #0 Cookie::setEngineStorage(void*) /home/daver/repos/couchbase/server/kv_engine/daemon/cookie.h:432 (memcached+0x66a6e1) #1 EventuallyPersistentEngine::storeEngineSpecific(...) kv_engine/engines/ep/src/ep_engine.cc:1841 (memcached+0x7bb0c2) #2 EventuallyPersistentEngine::storeIfInner(...) kv_engine/engines/ep/src/ep_engine.cc:2539 (memcached+0x7de96a) #3 EventuallyPersistentEngine::store_if(...) kv_engine/engines/ep/src/ep_engine.cc:478 (memcached+0x7debed) #4 bucket_store_if(...) kv_engine/daemon/protocol/mcbp/engine_wrapper.cc:148 (memcached+0x7517bd) #5 MutationCommandContext::storeItem() kv_engine/daemon/protocol/mcbp/mutation_context.cc:288 (memcached+0x733f6d) #6 MutationCommandContext::step() kv_engine/daemon/protocol/mcbp/mutation_context.cc:54 (memcached+0x7385f7) ... When setEngineStorage is called from EventuallyPersistentEngine::store_if (when issueing a SyncWrite) from the frontend thread, the per-frontend thread mutex is held when modifying Cookie:engine_storage. However when the background thread later modifies Cookie:engine_storage the front-end thread mutex is not held. Address this by making Cookie::engine_storage atomic. We could have added a mutex around it, but that would add additional space requirements for each Cookie (of which there are potentially many), so atomic_ptr suffices for the time being. Change-Id: I62e25b6a74d47c2da6b500cb3dc20d7ad2b01e03 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/163278 Tested-by: Build Bot <[email protected]> Reviewed-by: Paolo Cocchi <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 20, 2021
During bucket shutdown we intermittently see an exception thrown during task scheduling on a background NonIO thread, which crashes the memcached process. +Analysis+ Bug is as follows. Starting at the main thread which is deleting the Bucket (Thread 1): (gdb) bt ... #10 0x0000000000649bf3 in FollyExecutorPool::schedule(std::shared_ptr<GlobalTask>) () at /c++/10.2.0/new:175 #11 0x000000000084271b in EPVBucket::scheduleDeferredDeletion(EventuallyPersistentEngine&) () at /c++/10.2.0/ext/atomicity.h:100 #12 0x00000000006dfe7a in VBucket::DeferredDeleter::operator()(VBucket*) const () at kv_engine/engines/ep/src/vbucket.cc:3990 #13 0x000000000086f874 in std::_Sp_counted_deleter<EPVBucket*, VBucket::DeferredDeleter, ...>::_M_dispose () at /c++/10.2.0/bits/shared_ptr_base.h:453 ... #18 std::shared_ptr<VBucket>::~shared_ptr (this=0x7b44000515d0, __in_chrg=<optimized out>) at /c++/10.2.0/bits/shared_ptr.h:121 #19 PagingVisitor::~PagingVisitor (this=0x7b4400051540, __in_chrg=<optimized out>) at kv_engine/engines/ep/src/paging_visitor.h:39 ... #31 std::__shared_ptr<GlobalTask, (__gnu_cxx::_Lock_policy)2>::reset () at /c++/10.2.0/bits/shared_ptr_base.h:1301 #32 EventuallyPersistentEngine::waitForTasks(std::vector<std::shared_ptr<GlobalTask>, std::allocator<std::shared_ptr<GlobalTask> > >&) () at kv_engine/engines/ep/src/ep_engine.cc:6752 #33 0x000000000082396f in EventuallyPersistentEngine::destroyInner(bool) () at kv_engine/engines/ep/src/ep_engine.cc:2135 1. PagingVisitor is still in existence running after `EventuallyPersistentEngine::destroyInner` - see frame #19. This is because all tasks belonging to bucket were returned from unregisterTaskable() just before. 2. PagingVisitor (via VBCBAdaptor) is destroyed, it decrements the refcount on the shared_ptr<VBucket> it owns - see frame #18. 3. That is the last reference to the VBucket, which results in VBucket::DeferredDeleter being invoked which in turn schedules a task to delete the VBucket (disk and memory) in the background - see frame #11. We see the schedule's lambda happen on the SchedulerPool0 thread (T:35): Thread 35 "SchedulerPool0" hit Catchpoint 1 (exception thrown), __cxxabiv1::__cxa_throw (..., tinfo=0x10c4ec8 <typeinfo for std::out_of_range@@GLIBCXX_3.4>, ...) at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:80 (gdb) bt #1 0x00007ffff4cad7d2 in std::__throw_out_of_range (__s=__s@entry=0xcc68e6 "_Map_base::at") at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/functexcept.cc:82 ... #3 0x00000000005504ee in std::unordered_map<...>::at (__k=@0x7fffe83a8f88: 0x7b7400000848, this=0x7b1000005580) at /c++/10.2.0/bits/unordered_map.h:1000 #4 FollyExecutorPool::State::scheduleTask (this=..., executor=..., pool=..., task=...) at kv_engine/executor/folly_executorpool.cc:415 ... #8 folly::EventBase::runInEventBaseThreadAndWait(...) at folly/io/async/EventBase.cpp:671 ... In FollyExecutorPool::State::scheduleTask (frame #3) we attempt to lookup the Taskable (Bucket) in the ExecutorPool's map, however given its already been unregistered, the taskable is not found an the std::out_of_range exception is thrown. This is a lifetime issue. We have VBucket objects potentially being kept alive longer than their expected lifetime by virtue of background tasks having shared ownership of them - and those background tasks outlive the lifetime of their parent object (KVBucket), and crucially past when the owning Bucket is unregistered with the ExecutorPool and can no longer schedule tasks. When it then _does+ attempt to schedule a task against an unregistered (and deleted) Taskable; we see the crash. +Solution+ There's arguably two problems which should be addressed (although technically only one of the two is required to encounter this crash): 1. Background tasks owning VBuckets when they are not executing. 2. Background tasks outliving their associated Taskable (aka Bucket). This patch addresses the critical issue of (1) - we remove the (shared) ownership of VBucket from the background tasks which previoulsy had it - both PagingVisitor which is the problematic class in this scenario, but also in the other background Tasks which potentially have the same problem. The 2nd patch will tighten up the API for visiting VBuckets, so visitors are not passed a VBucketPtr, but instead VBucket& which reduces the chance of similar problems happening in future. The 3rd patch will adddress Background Taks outliving their Taskable. Change-Id: I340a3e4dc3d9234c4a34866b410fb8295a1c98d1 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/163783 Tested-by: Dave Rigby <[email protected]> Reviewed-by: Richard de Mellow <[email protected]> Reviewed-by: James H <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Nov 2, 2021
Previously, StatCheckpointTask and StatDCPTask immediately wrote responses when collecting stats while on a background thread. TSAN reported this as unsafe; no locks prevent potential racing with a frontend thread manipulating the cookie. Change both tasks to accumulate task values, but leave the frontend thread to actually write the responses when it resumes the ewouldblock'ed operation. TSAN Report: WARNING: ThreadSanitizer: data race (pid=24371) Read of size 8 at 0x7b54000a2df0 by thread T62: #0 Cookie::getHeader() const kv_engine/daemon/cookie.cc:201 (memcached+0x6508ac) #1 append_stats kv_engine/daemon/protocol/mcbp/stats_context.cc:95 (memcached+0x71fd6c) .... #19 void StatCollector::addStat<cb::stats::Key, unsigned long const&>(cb::stats::Key&&, unsigned long const&) const ../kv_engine/include/statistics/collector.h:336 (memcached+0x7e50e5) #20 EventuallyPersistentEngine::addAggregatedProducerStats(BucketStatCollector const&, ConnCounter const&) kv_engine/engines/ep/src/ep_engine.cc:4038 (memcached+0x7e50e5) #21 EventuallyPersistentEngine::doDcpStatsInner(CookieIface const*, std::function<void (std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, void const*)> const&, std::basic_string_view<char, std::char_traits<char> >) kv_engine/engines/ep/src/ep_engine.cc:4030 (memcached+0x81bd05) Previous write of size 8 at 0x7b54000a2df0 by thread T21 (mutexes: write M3843): #0 Cookie::setPacket(cb::mcbp::Header const&, bool) kv_engine/daemon/cookie.cc:186 (memcached+0x65080e) #1 Cookie::preserveRequest() kv_engine/daemon/cookie.h:225 (memcached+0x696aa7) #2 Connection::executeCommandPipeline() kv_engine/daemon/connection.cc:581 (memcached+0x696aa7) #3 Connection::executeCommandsCallback() kv_engine/daemon/connection.cc:793 (memcached+0x696be8) #4 Connection::rw_callback(bufferevent*, void*) kv_engine/daemon/connection.cc:942 (memcached+0x697851) #5 bufferevent_run_deferred_callbacks_unlocked /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/bufferevent.c:208 (libevent_core-2.1.so.7+0xf71d) #6 folly::EventBase::loopBody(int, bool) folly/io/async/EventBase.cpp:397 (memcached+0xfc9b52) #7 folly::EventBase::loop() folly/io/async/EventBase.cpp:315 (memcached+0xfcb06b) #8 folly::EventBase::loopForever() folly/io/async/EventBase.cpp:538 (memcached+0xfcb06b) #9 worker_libevent kv_engine/daemon/thread.cc:115 (memcached+0x6c16af) #10 CouchbaseThread::run() platform/src/cb_pthreads.cc:51 (memcached+0xf217d5) #11 platform_thread_wrap platform/src/cb_pthreads.cc:64 (memcached+0xf217d5) Change-Id: I3fbd8d51e174a7d19c5cb608a969795e445b8e86 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/163709 Tested-by: Build Bot <[email protected]> Reviewed-by: Dave Rigby <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jul 18, 2022
As observed in tests in patch to fix MB-47267 ("MB-47267 / MB-52383: Make backfill during warmup a PauseResume task"), ObjectRegister getAllocSize can be read and written by different threads without synchronisation when EP engine instances are destroyed and re-created: WARNING: ThreadSanitizer: data race (pid=128791) Read of size 8 at 0x7f584d8d48c0 by thread T41 (mutexes: write M333120309177634496, write M279640201042175720): #0 ObjectRegistry::onCreateBlob(Blob const*) ../kv_engine/engines/ep/src/objectregistry.cc:85 (ep.so+0x0000002d60aa) #1 Blob::Blob(char const*, unsigned long) ../kv_engine/engines/ep/src/blob.cc:51 (ep.so+0x00000006ba08) #2 Blob::New(char const*, unsigned long) ../kv_engine/engines/ep/src/blob.cc:26 (ep.so+0x00000006ba56) #3 vbucket_transition_state::toItem(Item&) const ../kv_engine/engines/ep/src/vbucket_state.cc:31 (ep.so+0x0000002b1c39) #4 CheckpointManager::queueSetVBState(VBucket&) ../kv_engine/engines/ep/src/checkpoint_manager.cc:953 (ep.so+0x00000008030a) #5 Warmup::populateVBucketMap(unsigned short) ../kv_engine/engines/ep/src/warmup.cc:1508 (ep.so+0x0000002c55fd) #6 WarmupPopulateVBucketMap::run() ../kv_engine/engines/ep/src/warmup.cc:350 (ep.so+0x0000002d47dd) #7 ExecutorThread::run() ../kv_engine/engines/ep/src/executorthread.cc:190 (ep.so+0x0000001ec57b) #8 launch_executor_thread ../kv_engine/engines/ep/src/executorthread.cc:36 (ep.so+0x0000001ecb69) #9 CouchbaseThread::run() ../platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x00000000a237) #10 platform_thread_wrap ../platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x00000000a237) #11 <null> <null> (libtsan.so.0+0x00000002843b) Previous write of size 8 at 0x7f584d8d48c0 by main thread: #0 ObjectRegistry::initialize(unsigned long (*)(void const*)) ../kv_engine/engines/ep/src/objectregistry.cc:72 (ep.so+0x0000002d5ea7) #1 create_instance ../kv_engine/engines/ep/src/ep_engine.cc:1777 (ep.so+0x000000191c06) #2 create_engine_instance(engine_reference*, server_handle_v1_t* (*)(), EngineIface**) ../kv_engine/utilities/engine_loader.cc:95 (engine_testapp+0x0000004614b9) #3 MockTestHarness::create_bucket(bool, char const*) <null> (engine_testapp+0x00000041f295) #4 test_reader_thread_starvation_warmup ../kv_engine/engines/ep/tests/ep_testsuite.cc:8246 (ep_testsuite.so+0x000000071909) #5 execute_test ../kv_engine/programs/engine_testapp/engine_testapp.cc:1402 (engine_testapp+0x00000041ac82) #6 main ../kv_engine/programs/engine_testapp/engine_testapp.cc:1675 (engine_testapp+0x00000041be5c) Location is global 'getAllocSize' of size 8 at 0x7f584d8d48c0 (ep.so+0x0000007708c0) In practice this race is most likely benign, as ObjectRegistry::initialize() is always passed the same argument to store to getAllocSize. However to silence TSan, change to an atomic variable accessed with memory_order_acquire / memory_order_release. Note this is the default ordering on x86-64, so this doesn't actually add any additional cost. Change-Id: I65d566270ae5fa0602fe0a907e78c2b6ae227353 Reviewed-on: https://review.couchbase.org/c/kv_engine/+/177600 Tested-by: Build Bot <[email protected]> Well-Formed: Restriction Checker Reviewed-by: Paolo Cocchi <[email protected]> Reviewed-by: Ben Huddleston <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Nov 3, 2022
As part of EPBucket::prepareForPause(), all of the vb_mutexes are lock()ed - and left locked: a) To ensure that any in-flight VBucket writes have completed and b) To inhibit and new writes from occurring. Then, when the EPBucket is resumed all the vb_mutexes are unlock()ed which allows VBucket writes to resume. However, EPBucket::prepareForResume() is not called on the same thread which called prepareForPause() - prepareForPause() runs on a background NonIO thread whereas resume runs synchronously in the front-end thread. As such, we are incorrectly unlocking a mutex from a different thread than the one which locked it - which is Undefined Behaviour - from cppreference.com[1]: void unlock(); Unlocks the mutex. The mutex must be locked by the current thread of execution, otherwise, the behavior is undefined. This is helpfully reported by ThreadSanitizer: WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread) (pid=58528) #0 pthread_mutex_unlock <null> (libtsan.so.0+0x3bf9a) #1 __gthread_mutex_unlock(pthread_mutex_t*) c++/10.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h:779 (memcached+0x5b594f) #2 std::mutex::unlock() c++/10.2.0/bits/std_mutex.h:118 (memcached+0x602555) #3 EPBucket::prepareForResume() kv_engine/engines/ep/src/ep_bucket.cc:2575 (memcached+0x84b94f) #4 EventuallyPersistentEngine::resume() kv_engine/engines/ep/src/ep_engine.cc:7002 (memcached+0x7d52d3) ... Fix by changing how we achieve inhibition of future VBucket writes: - Introduce a EPBucket::paused flag which is set in prepareForPause after all vb_mutexes have been acquired (and hence all in-flight VBucket writes have finished), but then unlock all vb_mutexes before returning from prepareForPause(). - When attempting to acquire a locked VBucket, check new paused flag before attempting to acquire the vb_mutex - if paused is set then block / return early (for try() variant). This keeps the required pause behaviour but avoids keeping vb_mutexes locked and having to later unlock (on a different thread). [1]: https://en.cppreference.com/w/cpp/thread/mutex Change-Id: I062583951a101a866866b79dfd6329672bb4ff42 Reviewed-on: https://review.couchbase.org/c/kv_engine/+/182099 Tested-by: Build Bot <[email protected]> Reviewed-by: Jim Walker <[email protected]>
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 18, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 18, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 19, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 19, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 23, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 23, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
ns-codereview
pushed a commit
that referenced
this pull request
May 4, 2023
dcpdrain currently sets the DCP noop-interval to 1s, so the producer will send NOOP requests to dcpdrain every 1s and dcpdrain needs to correctly handle this request and send a response. When connecting to clusters with high latency between client and server nodes, it can take more than 1 second to complete setting up the DCP connection and endering the main event loop. This means the server node may start to send DCP noop requests before the DCP connection is setup - and crucially dcpdrain's event loop is ready to process the DCP noop request. This results in dcpdrain crashing as it gets a DCP noop request when it is expecting a control response: Process 43094 launched: '/Users/dave/repos/couchbase/server/source/build/kv_engine/dcpdrain' (arm64) Using DCP flow control with buffer size: 13421772 Set DCP control message: set_priority=high Set DCP control message: supports_cursor_dropping_vulcan=true Set DCP control message: supports_hifi_MFU=true Set DCP control message: send_stream_end_on_client_close_stream=true Set DCP control message: enable_expiry_opcode=true Set DCP control message: set_noop_interval=1 Set DCP control message: enable_noop=true Set DCP control message: enable_out_of_order_snapshots=true 2023-05-03T12:11:28.705431+01:00 CRITICAL *** Fatal error encountered during exception handling *** 2023-05-03T12:11:28.708626+01:00 CRITICAL Caught unhandled std::exception-derived exception. what(): Header::getResponse(): Header is not a response Target 0: (dcpdrain) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x00000001c24a2d98 libsystem_kernel.dylib` __pthread_kill + 8 frame #1: 0x00000001c24d7ee0 libsystem_pthread.dylib` pthread_kill + 288 frame #2: 0x00000001c2412340 libsystem_c.dylib` abort + 168 frame #3: 0x00000001c2492b08 libc++abi.dylib` abort_message + 132 frame #4: 0x00000001c2482938 libc++abi.dylib` demangling_terminate_handler() + 312 frame #5: 0x00000001c2378330 libobjc.A.dylib` _objc_terminate() + 160 frame #6: 0x000000010008ef30 dcpdrain` backtrace_terminate_handler() + 752 at terminate_handler.cc:88 frame #7: 0x00000001c2491ea4 libc++abi.dylib` std::__terminate(void (*)()) + 20 frame #8: 0x00000001c2494c1c libc++abi.dylib` __cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 36 frame #9: 0x00000001c2494bc8 libc++abi.dylib` __cxa_throw + 140 frame #10: 0x000000010002a77c dcpdrain` BinprotResponse::getTracingData() const [inlined] cb::mcbp::Header::getResponse(this=0x00006000002044a0) const + 48 at header.h:134 frame #11: 0x000000010002a74c dcpdrain` BinprotResponse::getTracingData() const [inlined] BinprotResponse::getResponse(this=<unavailable>) const at client_mcbp_commands.cc:487 frame #12: 0x000000010002a74c dcpdrain` BinprotResponse::getTracingData(this=0x000000016fdfef90) const + 188 at client_mcbp_commands.cc:373 frame #13: 0x000000010002a638 dcpdrain` MemcachedConnection::recvResponse(this=0x0000000101604080, response=0x000000016fdfef90, opcode=<unavailable>, readTimeout=<unavailable>) + 84 at client_connection.cc:1043 ... frame #21: 0x0000000100038f40 dcpdrain` MemcachedConnection::backoff_execute(..., context="DCP_CONTROL", ...) + 100 at client_connection.cc:2016 frame #22: 0x000000010002bab4 dcpdrain` MemcachedConnection::execute(this=0x0000000101604080, command=0x000000016fdfefb0, readTimeout=(__rep_ = 0)) + 168 at client_connection.cc:1998 frame #23: 0x000000010000d688 dcpdrain` main + 280 at dcpdrain.cc:451 frame #24: 0x000000010000d570 dcpdrain` main(argc=<unavailable>, argv=<unavailable>) + 8488 at dcpdrain.cc:929 frame #25: 0x00000001005d508c dyld` start + 520 Ideally dcpdrain should be robust to receiving dcp NOOP messages while setting up the control flags, but that's not simple as we use common code in MemcachedConnection which performs a request and expects a response (of type DCP_CONTROL) in-order. To workaround this problem simply increase the default DCP noop interval from 1 to 60 seconds - 60s /should/ be sufficient to complete the handshake... Change-Id: I0f846956d6499ea54d74f781cb14d7982387c9f4 Reviewed-on: https://review.couchbase.org/c/kv_engine/+/190418 Tested-by: Build Bot <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jan 12, 2024
The method did not take a queueLock and could mutate the CheckpointManager while it is being accessed, e.g. in CheckpointManager::getListOfCursorsToDrop. CheckpointMemRecoveryTask calls getListOfCursorsToDrop which iterates CM::cursors. A concurrent RollbackTask can result in resetting the vbucket and calling CM::takeAndResetCursors, which among others mutates CM::cursors. WARNING: ThreadSanitizer: data race (pid=47061) Write of size 8 at 0x00010d3b77a8 by main thread (mutexes: write M0, write M1, write M2): #0 CheckpointManager::takeAndResetCursors(CheckpointManager&) checkpoint_manager.cc:1754 (ep-engine_ep_unit_tests:arm64+0x1003c7dd8) #1 KVBucket::resetVBucket_UNLOCKED(LockedVBucketPtr&, std::__1::unique_lock<std::__1::mutex>&) kv_bucket.cc:1273 (ep-engine_ep_unit_tests:arm64+0x1001fc414) #2 KVBucket::rollback(Vbid, unsigned long long) kv_bucket.cc:2634 (ep-engine_ep_unit_tests:arm64+0x10020a910) #3 CheckpointRemoverTest_MB59601_Test::TestBody() checkpoint_remover_test.cc:518 (ep-engine_ep_unit_tests:arm64+0x1005d2224) #4 virtual thunk to CheckpointRemoverTest_MB59601_Test::TestBody() checkpoint_remover_test.cc (ep-engine_ep_unit_tests:arm64+0x1005d24e8) #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) gtest.cc:2648 (ep-engine_ep_unit_tests:arm64+0x101b8f6bc) #6 <null> <null> (0x000186e390e0) Previous read of size 8 at 0x00010d3b77a8 by thread T1 (mutexes: write M3): #0 CheckpointManager::getListOfCursorsToDrop() checkpoint_manager.cc:842 (ep-engine_ep_unit_tests:arm64+0x1003c1af0) #1 CheckpointMemRecoveryTask::attemptCursorDropping() checkpoint_remover.cc:183 (ep-engine_ep_unit_tests:arm64+0x1003caf8c) #2 CheckpointMemRecoveryTask::runInner(bool) checkpoint_remover.cc:245 (ep-engine_ep_unit_tests:arm64+0x1003cb77c) #3 EpNotifiableTask::run() ep_task.cc:56 (ep-engine_ep_unit_tests:arm64+0x10028763c) #4 void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, CheckpointRemoverTest_MB59601_Test::TestBody()::$_2::operator()() const::'lambda0'()>>(void*) thread:299 (ep-engine_ep_unit_tests:arm64+0x100600c30) Change-Id: I15c1e9ccc6f45f3251ebd7f78649c8a446d65b54 Reviewed-on: https://review.couchbase.org/c/kv_engine/+/203302 Reviewed-by: Vesko Karaganev <[email protected]> Tested-by: Build Bot <[email protected]> Reviewed-by: Paolo Cocchi <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jan 22, 2024
Otherwise at exit some log entries can be lost resulting in an intermittent test failure. 1/1 Test #4: memcached-breakpad-test-std_exception_via_std_thread ...***Failed 0.08 sec 2024-01-19T10:19:11 INFO Spawning memcached 2024-01-19T10:19:11 INFO Process exited with status -6 2024-01-19T10:19:11 ERROR FAIL - No exception what() message written to stderr on crash. === stderr begin === libc++abi: terminating due to uncaught exception of type std::runtime_error: crash_engine: This exception wasn't handled === stderr end === Change-Id: Iaa679d1bb7a4593715f5beab004fd612809cb7fa Reviewed-on: https://review.couchbase.org/c/kv_engine/+/204112 Tested-by: Pavlos Georgiou <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jan 24, 2024
The method did not take a queueLock and could mutate the CheckpointManager while it is being accessed, e.g. in CheckpointManager::getListOfCursorsToDrop. CheckpointMemRecoveryTask calls getListOfCursorsToDrop which iterates CM::cursors. A concurrent RollbackTask can result in resetting the vbucket and calling CM::takeAndResetCursors, which among others mutates CM::cursors. WARNING: ThreadSanitizer: data race (pid=60355) Write of size 8 at 0x00010d1a5e68 by main thread (mutexes: write M0, write M1, write M2): #0 CheckpointManager::takeAndResetCursors(CheckpointManager&) checkpoint_manager.cc:1856 (ep-engine_ep_unit_tests:arm64+0x1003795b4) #1 KVBucket::resetVBucket_UNLOCKED(LockedVBucketPtr&, std::__1::unique_lock<std::__1::mutex>&) kv_bucket.cc:1271 (ep-engine_ep_unit_tests:arm64+0x1001da918) #2 KVBucket::rollback(Vbid, unsigned long long) kv_bucket.cc:2671 (ep-engine_ep_unit_tests:arm64+0x1001e8404) #3 CheckpointRemoverTest_MB59601_Test::TestBody() checkpoint_remover_test.cc:513 (ep-engine_ep_unit_tests:arm64+0x10054117c) #4 virtual thunk to CheckpointRemoverTest_MB59601_Test::TestBody() checkpoint_remover_test.cc (ep-engine_ep_unit_tests:arm64+0x100541448) #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) gtest.cc:2643 (ep-engine_ep_unit_tests:arm64+0x10195a8e0) #6 <null> <null> (0x000186e390e0) Previous read of size 8 at 0x00010d1a5e68 by thread T1 (mutexes: write M3): #0 CheckpointManager::getListOfCursorsToDrop() checkpoint_manager.cc:802 (ep-engine_ep_unit_tests:arm64+0x100372bdc) #1 CheckpointMemRecoveryTask::attemptCursorDropping() checkpoint_remover.cc:174 (ep-engine_ep_unit_tests:arm64+0x10037c710) #2 CheckpointMemRecoveryTask::runInner() checkpoint_remover.cc:291 (ep-engine_ep_unit_tests:arm64+0x10037d068) #3 NotifiableTask::run() notifiable_task.cc:18 (ep-engine_ep_unit_tests:arm64+0x101934ed8) #4 void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, CheckpointRemoverTest_MB59601_Test::TestBody()::$_3::operator()() const::'lambda0'()>>(void*) thread:299 (ep-engine_ep_unit_tests:arm64+0x1005661f0) Change-Id: I7fe1ed1f6ebca811a5dfca6c2e69d04bfa91b2b8 Reviewed-on: https://review.couchbase.org/c/kv_engine/+/203991 Tested-by: Pavlos Georgiou <[email protected]> Reviewed-by: Vesko Karaganev <[email protected]> Reviewed-by: Paolo Cocchi <[email protected]> Well-Formed: Restriction Checker
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
typo in backfill spelling at below line -
"This is called as disk backill. "