-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plasma: Introduce PlasmaKVStore #3
Open
scottlashley
wants to merge
1
commit into
couchbase:master
Choose a base branch
from
hisundar:plasma
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PlasmaKVStore is an experimental KVStore implementation backed by Plasma. It is very incomplete, and is not intended for general usage. As such, it is hidden behind the `EP_USE_PLASMA` flag. To use it, one can build with (for example) make EXTRA_CMAKE_OPTIONS='-DEP_USE_PLASMA=ON' Change-Id: Id687170ca87eb9f25764576409fc18f508a0328f
jameseh96
pushed a commit
to jameseh96/kv_engine
that referenced
this pull request
Aug 15, 2018
Identified in Thread Sanitizer: WARNING: ThreadSanitizer: data race (pid=28916) Read of size 8 at 0x0000019842f8 by thread T5 (mutexes: write M2756): #0 mc_time_convert_to_real_time /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/mc_time.cc:117 (memcached+0x0000004ab4e2) couchbase#1 default_item_allocate_ex /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/engines/default_engine/default_engine.cc:364 (default_engine.so+0x00000001d49b) couchbase#2 bucket_allocate_ex(Cookie&, DocKey const&, unsigned long, unsigned long, int, unsigned int, unsigned char, unsigned short) /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/protocol/mcbp/engine_wrapper.cc:304 (memcached+0x000000458986) couchbase#3 MutationCommandContext::allocateNewItem() /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/protocol/mcbp/mutation_context.cc:216 (memcached+0x0000004e02cd) couchbase#4 MutationCommandContext::step() /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/protocol/mcbp/mutation_context.cc:57 (memcached+0x0000004e1057) couchbase#5 SteppableCommandContext::drive() /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/protocol/mcbp/steppable_command_context.cc:33 (memcached+0x0000004f1d04) couchbase#6 add_set_replace_executor /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/mcbp_executors.cc:169 (memcached+0x0000004b60b9) #7 add_executor /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/mcbp_executors.cc:173 (memcached+0x0000004b60b9) #8 std::_Function_handler<void (Cookie&), void (*)(Cookie&)>::_M_invoke(std::_Any_data const&, Cookie&) /usr/local/include/c++/7.2.0/bits/std_function.h:316 (memcached+0x0000004b9cd2) #9 std::function<void (Cookie&)>::operator()(Cookie&) const /usr/local/include/c++/7.2.0/bits/std_function.h:706 (memcached+0x0000004b7c3c) #10 execute_request_packet /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/mcbp_executors.cc:743 (memcached+0x0000004b7c3c) #11 mcbp_execute_packet(Cookie&) /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/mcbp_executors.cc:824 (memcached+0x0000004b7c3c) #12 conn_execute /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/statemachine_mcbp.cc:350 (memcached+0x0000004f84c7) #13 McbpStateMachine::execute() /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/statemachine_mcbp.cc:151 (memcached+0x0000004f8c26) #14 McbpConnection::runStateMachinery() /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/connection_mcbp.cc:792 (memcached+0x000000496b4d) #15 McbpConnection::runEventLoop(short) /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/connection_mcbp.cc:973 (memcached+0x000000496bef) #16 run_event_loop(Connection*, short) /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/connections.cc:159 (memcached+0x00000049b071) #17 event_handler(int, short, void*) /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/memcached.cc:1018 (memcached+0x000000438b02) #18 event_process_active_single_queue.isra.26 <null> (libevent_core.so.2.1.8+0x00000001aa23) #19 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/platform/src/cb_pthreads.cc:59 (libplatform_so.so.0.1.0+0x00000000974f) #20 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/platform/src/cb_pthreads.cc:72 (libplatform_so.so.0.1.0+0x00000000974f) #21 <null> <null> (libtsan.so.0+0x000000024feb) Previous write of size 8 at 0x0000019842f8 by main thread: #0 mc_time_clock_tick /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/mc_time.cc:255 (memcached+0x0000004ab9a3) couchbase#1 mc_time_clock_event_handler /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/mc_time.cc:210 (memcached+0x0000004abbde) couchbase#2 event_process_active_single_queue.isra.26 <null> (libevent_core.so.2.1.8+0x00000001aa8a) couchbase#3 main /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/main.cc:33 (memcached+0x00000041080e) Location is global 'memcached_epoch' of size 8 at 0x0000019842f8 (memcached+0x0000019842f8) Mutex M2756 (0x7b74000005c0) created at: #0 pthread_mutex_lock <null> (libtsan.so.0+0x00000003876f) couchbase#1 __gthread_mutex_lock /usr/local/include/c++/7.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748 (memcached+0x00000046d413) couchbase#2 std::mutex::lock() /usr/local/include/c++/7.2.0/bits/std_mutex.h:103 (memcached+0x00000046d413) couchbase#3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) /usr/local/include/c++/7.2.0/bits/std_mutex.h:162 (memcached+0x00000046d413) couchbase#4 thread_libevent_process /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/thread.cc:293 (memcached+0x00000046d413) couchbase#5 event_process_active_single_queue.isra.26 <null> (libevent_core.so.2.1.8+0x00000001aa23) couchbase#6 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/platform/src/cb_pthreads.cc:59 (libplatform_so.so.0.1.0+0x00000000974f) #7 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/platform/src/cb_pthreads.cc:72 (libplatform_so.so.0.1.0+0x00000000974f) #8 <null> <null> (libtsan.so.0+0x000000024feb) Thread T5 'mc:worker_2' (tid=28923, running) created by main thread at: #0 pthread_create <null> (libtsan.so.0+0x0000000282a0) couchbase#1 cb_create_named_thread /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/platform/src/cb_pthreads.cc:110 (libplatform_so.so.0.1.0+0x000000009447) couchbase#2 create_worker /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/thread.cc:86 (memcached+0x00000046df49) couchbase#3 thread_init(unsigned long, event_base*, void (*)(int, short, void*)) /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/thread.cc:515 (memcached+0x00000046df49) couchbase#4 memcached_main /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/memcached.cc:2548 (memcached+0x00000043de8d) couchbase#5 main /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/main.cc:33 (memcached+0x00000041080e) SUMMARY: ThreadSanitizer: data race /home/couchbase/jenkins/workspace/kv_engine-threadsanitizer-master-gcc7/kv_engine/daemon/mc_time.cc:117 in mc_time_convert_to_real_time Change-Id: I70c3ef30cdf4f9282505f797772c5684c8cfab9d Reviewed-on: http://review.couchbase.org/90802 Tested-by: Build Bot <[email protected]> Reviewed-by: Dave Rigby <[email protected]>
jameseh96
pushed a commit
to jameseh96/kv_engine
that referenced
this pull request
Aug 15, 2018
As identified by UBSan, CouchKVStore::rollback() doesn't check the return value of resdVBState; which can result in it attempting to write a null local document to the rolled-back vBucket: [ RUN ] CouchKVStoreErrorInjectionTest.readVBState_open_local_document runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7ffff7b5f30a in encode_root couchstore/src/node_types.cc:75 couchbase#1 0x7ffff7b36033 in db_write_header couchstore/src/couch_db.cc:175 couchbase#2 0x7ffff7b3f487 in couchstore_commit couchstore/src/couch_db.cc:255 couchbase#3 0x12c0e6d in CouchKVStore::rollback(unsigned short, unsigned long, std::shared_ptr<RollbackCB>) kv_engine/engines/ep/src/couch-kvstore/couch-kvstore.cc:2674 couchbase#4 0xd15cc6 in CouchKVStoreErrorInjectionTest_readVBState_open_local_document_Test::TestBody() kv_engine/engines/ep/tests/module_tests/kvstore_test.cc:1030 The local document is essential to interpreting the vBucket file; so if we can't read it we need to fail the rollback. Change-Id: I83871de2d4a96197bce17cbc9f1147792795a783 Reviewed-on: http://review.couchbase.org/93761 Tested-by: Build Bot <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
jameseh96
pushed a commit
to jameseh96/kv_engine
that referenced
this pull request
Aug 15, 2018
As identified by UBSan, if a sub-document operation results in a zero-length result (which is valid); the current implementation passes a null pointer to memcpy, which is undefined behaviour: [ RUN ] TransportProtocols/XattrTest.SetXattrAndDeleteBasic/Mcbp_XattrYes_JsonYes_SnappyYes runtime error: null pointer passed as argument 2, which is declared to never be null #0 0xd32951 in operate_single_doc kv_engine/daemon/subdocument.cc:776 couchbase#1 0xd3522d in do_body_phase kv_engine/daemon/subdocument.cc:1136 couchbase#2 0xd3522d in subdoc_operate kv_engine/daemon/subdocument.cc:1183 couchbase#3 0xd3522d in subdoc_executor kv_engine/daemon/subdocument.cc:431 Fix by using std::copy instead. Change-Id: Ia5e4d7f76fd57a81c62b930ded7b85dd31a1ae24 Reviewed-on: http://review.couchbase.org/93766 Tested-by: Build Bot <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
jameseh96
pushed a commit
to jameseh96/kv_engine
that referenced
this pull request
Aug 15, 2018
As identified by UBSan, if we try to create a zero-length SerialisedDocKey (which is valid); the current implementation passes a null pointer to memcpy: [ RUN ] MutationLogTest.Logging runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x11e9309 in SerialisedDocKey::SerialisedDocKey(DocKey) kv_engine/engines/ep/src/storeddockey.h:277 couchbase#1 0x11e9309 in MutationLogEntryV2::MutationLogEntryV2(MutationLogType, unsigned short, DocKey const&) kv_engine/engines/ep/src/mutation_log_entry.h:310 couchbase#2 0x11e9309 in MutationLogEntryV2::MutationLogEntryV2(MutationLogType, unsigned short) kv_engine/engines/ep/src/mutation_log_entry.h:321 couchbase#3 0x11e9309 in MutationLogEntryV2::newEntry(unsigned char*, MutationLogType, unsigned short) kv_engine/engines/ep/src/mutation_log_entry.h:223 couchbase#4 0x11e9309 in MutationLog::commit1() kv_engine/engines/ep/src/mutation_log.cc:322 Fix by using std::copy instead. Change-Id: I0994f1522efeb046c58da375053b6257fdc89a6a Reviewed-on: http://review.couchbase.org/93762 Tested-by: Build Bot <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
jameseh96
pushed a commit
to jameseh96/kv_engine
that referenced
this pull request
Aug 15, 2018
TSan reports the follow use-after-free error: WARNING: ThreadSanitizer: heap-use-after-free (pid=5347) Read of size 1 at 0x7b7400040609 by main thread: #0 memcmp <null> (libtsan.so.0+0x000000043643) couchbase#1 check_key_value() kv_engine/engines/ep/tests/ep_testsuite_common.cc:468 (ep_testsuite.so+0x0000000975ad) couchbase#2 test_mem_stats kv_engine/engines/ep/tests/ep_testsuite.cc:2036 (ep_testsuite.so+0x00000002c5c6) couchbase#3 execute_test kv_engine/programs/engine_testapp/engine_testapp.cc:1102 (engine_testapp+0x00000041b0ba) couchbase#4 main kv_engine/programs/engine_testapp/engine_testapp.cc:1499 (engine_testapp+0x00000041c5b2) Previous write of size 8 at 0x7b7400040608 by thread T11 (mutexes: write M3231945710371016): #0 operator delete() <null> (libtsan.so.0+0x00000006a7b4) couchbase#1 Blob::operator delete() kv_engine/engines/ep/src/blob.h:124 (ep.so+0x0000001959c2) couchbase#2 Blob::Deleter::operator()(TaggedPtr<Blob>) kv_engine/engines/ep/src/blob.h:137 (ep.so+0x0000001959c2) couchbase#3 SingleThreadedRCPtr<Blob, TaggedPtr<Blob>, Blob::Deleter>::swap(TaggedPtr<Blob>) kv_engine/engines/ep/src/atomic.h:362 (ep.so+0x0000001959c2) couchbase#4 SingleThreadedRCPtr<Blob, TaggedPtr<Blob>, Blob::Deleter>::reset(TaggedPtr<Blob>) kv_engine/engines/ep/src/atomic.h:298 (ep.so+0x0000001959c2) couchbase#5 StoredValue::replaceValue(TaggedPtr<Blob>) kv_engine/engines/ep/src/stored-value.h:540 (ep.so+0x0000001959c2) couchbase#6 StoredValue::storeCompressedBuffer(cb::const_char_buffer) kv_engine/engines/ep/src/stored-value.cc:362 (ep.so+0x0000001959c2) #7 HashTable::storeCompressedBuffer(cb::const_char_buffer, StoredValue&) kv_engine/engines/ep/src/hash_table.cc:620 (ep.so+0x00000012be5c) #8 ItemCompressorVisitor::visit(HashTable::HashBucketLock const&, StoredValue&) kv_engine/engines/ep/src/item_compressor_visitor.cc:52 (ep.so+0x000000139780) #9 HashTable::pauseResumeVisit(HashTableVisitor&, HashTable::Position&) kv_engine/engines/ep/src/hash_table.cc:753 (ep.so+0x000000127b6c) #10 PauseResumeVBAdapter::visit(VBucket&) kv_engine/engines/ep/src/vb_visitors.cc:36 (ep.so+0x0000001a81b1) #11 KVBucket::pauseResumeVisit(PauseResumeVBVisitor&, KVBucketIface::Position&) kv_engine/engines/ep/src/kv_bucket.cc:2177 (ep.so+0x000000158ec9) #12 ItemCompressorTask::run() kv_engine/engines/ep/src/item_compressor.cc:70 (ep.so+0x000000138204) #13 ExecutorThread::run() kv_engine/engines/ep/src/executorthread.cc:146 (ep.so+0x00000011da74) #14 launch_executor_thread kv_engine/engines/ep/src/executorthread.cc:34 (ep.so+0x00000011e0ae) #15 CouchbaseThread::run() platform/src/cb_pthreads.cc:59 (libplatform_so.so.0.1.0+0x000000009cc9) #16 platform_thread_wrap platform/src/cb_pthreads.cc:72 (libplatform_so.so.0.1.0+0x000000009cc9) #17 <null> <null> (libtsan.so.0+0x000000024feb) Mutex M3231945710371016 is already destroyed. The issue is in In check_key_value(); which uses get_item_info() to retrieve the address and size of the value (Blob) of the given key before checking it. get_item_info() doesn't retain a reference on the underlying engine's Item and consequently Blob. As such it's not safe to access the underlying Blob. Fix by explicitly fetching the Item via get(), and retaining the return value for the duration of accessing the Blob. This ensures the ref-count is kept and so the Blob cannot be deleted while check_key_value() is accessing it. Change-Id: If6cefef4d988ca26c33e798ecc383e94478c474d Reviewed-on: http://review.couchbase.org/95743 Well-Formed: Build Bot <[email protected]> Reviewed-by: Tim Bradgate <[email protected]> Reviewed-by: Sriram Ganesan <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 9, 2018
As part of the Vbid refactoring (MB-30552), VBucket::getId()'s return type was changed from uint16_t to the strong type Vbid. However, EventuallyPersistentEngine::addSeqnoVbStats() was not updated to extract the raw uint16_t from the Vbid when printing. When building in Debug mode, this results in a corrupt vbid - for example see the following test failure: [ RUN ] StatTest.vbucket_seqno_stats_test ../kv_engine/engines/ep/tests/module_tests/stats_test.cc:96: Failure Value of: vals Expected: has 7 elements and there exists some permutation of elements such that: - element #0 has a key that is equal to "vb_0:uuid", and - element #1 has a first field that is equal to "vb_0:high_seqno", and has a second field that is equal to "0", and - element #2 has a first field that is equal to "vb_0:abs_high_seqno", and has a second field that is equal to "0", and - element #3 has a first field that is equal to "vb_0:last_persisted_seqno", and has a second field that is equal to "0", and - element #4 has a first field that is equal to "vb_0:purge_seqno", and has a second field that is equal to "0", and - element #5 has a first field that is equal to "vb_0:last_persisted_snap_start", and has a second field that is equal to "0", and - element #6 has a first field that is equal to "vb_0:last_persisted_snap_end", and has a second field that is equal to "0" Actual: { ("vb_0:high_seqno", "0"), ("vb_0:last_persisted_snap_start", "0"), ("vb_258146304:abs_high_seqno", "0"), ("vb_258146304:last_persisted_seqno", "0"), ("vb_258146304:last_persisted_snap_end", "0"), ("vb_258146304:purge_seqno", "0"), ("vb_258146304:uuid", "87176786588641") } [ FAILED ] StatTest.vbucket_seqno_stats_test (4 ms) Note this doesn't manifest under a release build (hence CV passing) - mostly likely because the address of the raw uint16_t is the same as the Vbid object itself; and the optimizer effectively hides the bug. Fix by adding the missing .get() call. Change-Id: I2d90ddc2855874035d7d8877a678f89dfb0a0c9d Reviewed-on: http://review.couchbase.org/100403 Reviewed-by: Chris Farman <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jan 14, 2019
…stroy As reported by ASan (see below). Fix by changing the order of construction/destruction of Configuration. ================================================================= ==25686==ERROR: AddressSanitizer: heap-use-after-free on address 0x6070004182f8 at pc 0x7fc48fe80bfe bp 0x7fc4865d3050 sp 0x7fc4865d3048 READ of size 8 at 0x6070004182f8 thread T34 (mc:bucket_del) #0 0x7fc48fe80bfd in std::_Rb_tree<...>::find() const /usr/local/include/c++/7.3.0/bits/stl_tree.h:2533 #1 0x7fc48fe81ce6 in std::map<...>::find() const /usr/local/include/c++/7.3.0/bits/stl_map.h:1189 #2 0x7fc48fe81ce6 in unsigned long Configuration::getParameter<unsigned long>(...) const kv_engine/engines/ep/src/configuration.cc:171 #3 0x7fc48fead5f9 in Configuration::getDcpConnBufferSize() const build/kv_engine/engines/ep/src/generated_configuration.cc:591 #4 0x7fc48fa9d249 in DcpFlowControlManager::setBufSizeWithinBounds(DcpConsumer*, unsigned long&) kv_engine/engines/ep/src/dcp/flow-control-manager.cc:49 #5 0x7fc48fa9ef2d in DcpFlowControlManagerAggressive::handleDisconnect(DcpConsumer*) kv_engine/engines/ep/src/dcp/flow-control-manager.cc:202 #6 0x7fc48fa59cf2 in DcpConsumer::~DcpConsumer() kv_engine/engines/ep/src/dcp/consumer.cc:187 <cut shared_ptr details...> #19 0x7fc48f9c1cb7 in AtomicQueue<std::shared_ptr<ConnHandler> >::~AtomicQueue() kv_engine/engines/ep/src/atomicqueue.h:39 #20 0x7fc48f9c1cb7 in ConnMap::~ConnMap() kv_engine/engines/ep/src/connmap.cc:106 #21 0x7fc48fa8c4b7 in DcpConnMap::~DcpConnMap() kv_engine/engines/ep/src/dcp/dcpconnmap.cc:63 #22 0x7fc48fa8c4b7 in DcpConnMap::~DcpConnMap() kv_engine/engines/ep/src/dcp/dcpconnmap.cc:65 #23 0x7fc48fb9d41c in std::default_delete<DcpConnMap>::operator()(DcpConnMap*) const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:78 #24 0x7fc48fb9d41c in std::unique_ptr<DcpConnMap, std::default_delete<DcpConnMap> >::~unique_ptr() /usr/local/include/c++/7.3.0/bits/unique_ptr.h:268 #25 0x7fc48fb9d41c in EventuallyPersistentEngine::~EventuallyPersistentEngine() kv_engine/engines/ep/src/ep_engine.cc:5772 #26 0x7fc48fb9da8e in EventuallyPersistentEngine::~EventuallyPersistentEngine() kv_engine/engines/ep/src/ep_engine.cc:5780 #27 0x7fc48fb9da8e in EventuallyPersistentEngine::destroy(bool) kv_engine/engines/ep/src/ep_engine.cc:154 #28 0x44299e in DestroyBucketThread::destroy() kv_engine/daemon/memcached.cc:1980 0x6070004182f8 is located 40 bytes inside of 80-byte region [0x6070004182d0,0x607000418320) freed by thread T34 (mc:bucket_del) here: #0 0x7fc49a5c16b0 in operator delete(void*) (install/bin/../lib/libasan.so.4+0xdb6b0) <cut shared_ptr details...> #8 0x7fc48fe77232 in Configuration::~Configuration() kv_engine/engines/ep/src/configuration.h:178 #9 0x7fc48fb9d38b in EventuallyPersistentEngine::~EventuallyPersistentEngine() kv_engine/engines/ep/src/ep_engine.cc:5772 #10 0x7fc48fb9da8e in EventuallyPersistentEngine::~EventuallyPersistentEngine() kv_engine/engines/ep/src/ep_engine.cc:5780 #11 0x7fc48fb9da8e in EventuallyPersistentEngine::destroy(bool) kv_engine/engines/ep/src/ep_engine.cc:154 #12 0x44299e in DestroyBucketThread::destroy() kv_engine/daemon/memcached.cc:1980 #13 0x4439ee in DestroyBucketThread::run() kv_engine/daemon/memcached.cc:2012 #14 0x7fc498801ff0 in Couchbase::Thread::thread_entry() platform/src/thread.cc:45 #15 0x7fc4987d57d8 in CouchbaseThread::run() platform/src/cb_pthreads.cc:59 #16 0x7fc4987d57d8 in platform_thread_wrap platform/src/cb_pthreads.cc:72 #17 0x7fc496bd06b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) Change-Id: I5c4efec7a58ded4c29cca6cee73886b0a0e51172 Reviewed-on: http://review.couchbase.org/103584 Tested-by: Build Bot <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Feb 1, 2019
On macOS the following exception is thrown at the end of ep_engine_benchmarks: libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument The issue is an instance of the 'static (de)initialization order fiacso' - globalBucketLogger (a shared_ptr) is not destroyed before the process exits - it is only cleaned up by the C++ runtime. However, another singleton object - spdlog::registry which the BucketLogget dtor uses to unregister itself - has already been destroyed. As such an exception is thrown attempting to lock a mutex which has already been destroyed: (lldb) bt * frame #0: 0x00007fff6e84b1f6 libc++abi.dylib` __cxa_throw = frame #1: 0x00007fff6e81e7af libc++.1.dylib` std::__1::__throw_system_error(int, char const*) + 77 frame #2: 0x00007fff6e810c93 libc++.1.dylib` std::__1::mutex::lock() + 29 frame #3: 0x000000010068fd75 libmemcached_logger.1.0.0.dylib` spdlog::details::registry::drop(...) [inlined] std::__1::lock_guard<std::__1::mutex>::lock_guard(__m=0x00000001006a3c20) + 21 at __mutex_base:104 frame #4: 0x000000010068fd70 libmemcached_logger.1.0.0.dylib` spdlog::details::registry::drop(..&) [inlined] std::__1::lock_guard<std::__1::mutex>::lock_guard(__m=0x00000001006a3c20) at __mutex_base:104 frame #5: 0x000000010068fd70 libmemcached_logger.1.0.0.dylib` spdlog::details::registry::drop(this=0x00000001006a3c20, logger_name="globalBucketLogger") + 16 at registry.h:173 frame #6: 0x000000010004608f ep_engine_benchmarks` BucketLogger::~BucketLogger() [inlined] BucketLogger::unregister(this=<unavailable>) + 47 at bucket_logger.cc:81 frame #7: 0x000000010004607b ep_engine_benchmarks` BucketLogger::~BucketLogger(this=0x0000000105047080) + 27 at bucket_logger.cc:35 frame #8: 0x00000001000461ee ep_engine_benchmarks` BucketLogger::~BucketLogger() [inlined] BucketLogger::~BucketLogger(this=0x0000000105047080) + 14 at bucket_logger.cc:34 frame #9: 0x00000001000461e9 ep_engine_benchmarks` BucketLogger::~BucketLogger(this=0x0000000105047080) + 9 at bucket_logger.cc:34 frame #10: 0x0000000100046e21 ep_engine_benchmarks` std::__1::shared_ptr<BucketLogger>::~shared_ptr() [inlined] std::__1::__shared_count::__release_shared(this=0x00000001052f5540) + 49 at memory:3490 frame #11: 0x0000000100046e18 ep_engine_benchmarks` std::__1::shared_ptr<BucketLogger>::~shared_ptr() [inlined] std::__1::__shared_weak_count::__release_shared(this=0x00000001052f5540) at memory:3532 frame #12: 0x0000000100046e18 ep_engine_benchmarks` std::__1::shared_ptr<BucketLogger>::~shared_ptr() [inlined] std::__1::shared_ptr<BucketLogger>::~shared_ptr(this=<unavailable>) at memory:4468 frame #13: 0x0000000100046e18 ep_engine_benchmarks` std::__1::shared_ptr<BucketLogger>::~shared_ptr(this=<unavailable>) + 40 at memory:4466 frame #14: 0x00007fff70935eed libsystem_c.dylib` __cxa_finalize_ranges + 351 frame #15: 0x00007fff709361fe libsystem_c.dylib` exit + 55 frame #16: 0x00007fff7088901c libdyld.dylib` start + 8 Fix by manually resetting globalBucketLogger() before the end of main(). Change-Id: I8abeb71c77580e0a4ef1342612d3c7af084de122 Reviewed-on: http://review.couchbase.org/104331 Tested-by: Build Bot <[email protected]> Reviewed-by: Ben Huddleston <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Apr 25, 2019
Currently fails ~50% of the time under UBSan: null pointer passed as argument 2, which is declared to never be null #0 0x7fc16960c806 in magma::Buffer::CopyInto(char const*, unsigned long) magma/util/buffer.cc:194 #1 0x7fc1692b5727 in magma::WAL::AddRecord(magma::WAL::walRecType, std::vector<magma::Slice*, std::allocator<magma::Slice*> > const&, magma::WALOffset&) magma/wal/wal.cc:281 #2 0x7fc1692b89a2 in magma::WAL::endTxn(magma::WAL::walRecType, magma::Slice&) magma/wal/wal.cc:206 #3 0x7fc1692b95ab in magma::WAL::AbortTxn() magma/wal/wal.cc:233 #4 0x7fc16958941e in magma::Magma::Impl::recovery() magma/magma/recovery.cc:356 #5 0x7fc1695cf467 in magma::Magma::Open() magma/magma/magma.cc:143 #6 0x2e7a2f3 in MagmaKVStore::MagmaKVStore(MagmaKVStoreConfig&) kv_engine/engines/ep/src/magma-kvstore/magma-kvstore.cc:445 #7 0x29c9679 in std::_MakeUniq<MagmaKVStore>::__single_object std::make_unique<MagmaKVStore, MagmaKVStoreConfig&>(MagmaKVStoreConfig&) /usr/local/include/c++/7.3.0/bits/unique_ptr.h:825 #8 0x29c9679 in KVStoreFactory::create(KVStoreConfig&) kv_engine/engines/ep/src/kvstore.cc:97 #9 0x1cb0232 in setup_kv_store kv_engine/engines/ep/tests/module_tests/kvstore_test.cc:193 #10 0x1e572f7 in MagmaKVStoreTest::SetUp() kv_engine/engines/ep/tests/module_tests/kvstore_test.cc:2605 Change-Id: I02cd0a0d71e728296ab9a6027b06d17809fc3248 Reviewed-on: http://review.couchbase.org/108269 Reviewed-by: Paolo Cocchi <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jun 3, 2019
Before attempting to remove any queued ACKs, check the vbucket is valid - it may have already been deleted. As detected by ASan: ==30588==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000002f8 (pc 0x7fc1c437a4b6 bp 0x7fc1bfb35730 sp 0x7fc1bfb35500 T7) ==30588==The signal is caused by a READ memory access. ==30588==Hint: address points to the zero page. #0 0x7fc1c437a4b5 in std::__atomic_base<unsigned int>::load(std::memory_order) const /usr/local/include/c++/7.3.0/bits/atomic_base.h:396 #1 0x7fc1c437a4b5 in bool folly::SharedMutexImpl<...>::lockSharedImpl<...::WaitForever&) build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:1239 #2 0x7fc1c437a4b5 in folly::SharedMutexImpl<...>::lock_shared() build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:375 #3 0x7fc1c437a4b5 in folly::SharedMutexImpl<...>::ReadHolder() build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:1315 #4 0x7fc1c437a4b5 in ActiveStream::setDead(end_stream_status_t) kv_engine/engines/ep/src/dcp/active_stream.cc:1135 #5 0x7fc1c44a3f3a in operator() kv_engine/engines/ep/src/dcp/producer.cc:1489 #6 0x7fc1c44a3f3a in for_each<...> /local/include/c++/7.3.0/bits/stl_algo.h:3884 #7 0x7fc1c44a3f3a in DcpProducer::setDisconnect() kv_engine/engines/ep/src/dcp/producer.cc:1491 #8 0x7fc1c4428f02 in DcpConnMap::disconnect(void const*) kv_engine/engines/ep/src/dcp/dcpconnmap.cc:316 #9 0x7fc1c45897a0 in EventuallyPersistentEngine::handleDisconnect(void const*) kv_engine/engines/ep/src/ep_engine.cc:5792 #10 0x7fc1c45897a0 in EvpHandleDisconnect kv_engine/engines/ep/src/ep_engine.cc:1682 #11 0x447119 in perform_callbacks(ENGINE_EVENT_TYPE, void const*, void const*) kv_engine/daemon/memcached.cc:301 #12 0x62cb94 in Connection::propagateDisconnect() const kv_engine/daemon/connection.cc:1506 #13 0x62cb94 in Connection::close() kv_engine/daemon/connection.cc:1487 #14 0x784fec in StateMachine::conn_pending_close() kv_engine/daemon/statemachine.cc:577 #15 0x784fec in StateMachine::execute() kv_engine/daemon/statemachine.cc:129 #16 0x63ccff in Connection::runStateMachinery() kv_engine/daemon/connection.cc:1312 #17 0x63cfcb in Connection::runEventLoop(short) kv_engine/daemon/connection.cc:1386 #18 0x681ab8 in run_event_loop(Connection*, short) kv_engine/daemon/connections.cc:147 #19 0x45df77 in event_handler(int, short, void*) kv_engine/daemon/memcached.cc:855 #20 0x7fc1ccc1f086 in event_persist_closure deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1580 #21 0x7fc1ccc1f086 in event_process_active_single_queue deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1639 #22 0x7fc1ccc1f5fe in event_process_active deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1738 #23 0x7fc1ccc1f5fe in event_base_loop deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1961 #24 0x5aa3df in worker_libevent kv_engine/daemon/thread.cc:218 #25 0x7fc1cd4b8868 in CouchbaseThread::run() platform/src/cb_pthreads.cc:58 #26 0x7fc1cd4b8868 in platform_thread_wrap platform/src/cb_pthreads.cc:71 #27 0x7fc1cb35b6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) #28 0x7fc1cb09141c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c) Change-Id: Ic806fdfa6aca458a0dc1b82f046bca76dcb75d40 Reviewed-on: http://review.couchbase.org/110054 Reviewed-by: Daniel Owen <[email protected]> Tested-by: Dave Rigby <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jun 6, 2019
This causes deadlock due to recursive locking of tMutex in ExecutorPool if the task is the last thing that owns a vBucket and is attempting to schedule deferred deletion. Fix this by holding a weak pointer instead. If we promote the pointer then we are running normally and won't have previously acquired tMutex. If we are cancelling the task at destruction of the engine, we will not attempt to delete the vBucket. 15:05:37 #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 15:05:38 #1 0x00007f551159fdbd in __GI___pthread_mutex_lock (mutex=0x7f5510435b08) at ../nptl/pthread_mutex_lock.c:80 15:05:38 #2 0x00007f550af2ef9d in __gthread_mutex_lock (__mutex=0x7f5510435b08) at /usr/local/include/c++/7.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748 15:05:38 #3 std::mutex::lock (this=0x7f5510435b08) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:103 15:05:38 #4 std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:162 15:05:38 #5 ExecutorPool::_schedule (this=this@entry=0x7f5510435a00, task=...) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/executorpool.cc:420 15:05:38 #6 0x00007f550af2f13d in ExecutorPool::schedule (this=0x7f5510435a00, task=...) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/executorpool.cc:440 15:05:40 #7 0x00007f550af2ad1d in EphemeralVBucket::scheduleDeferredDeletion (this=<optimized out>, engine=...) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/ephemeral_vb.cc:841 15:05:40 #8 0x00007f550af64dc1 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7f550a3a9060) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:154 15:05:40 #9 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:684 15:05:40 #10 std::__shared_ptr<VBucket, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:1123 15:05:40 #11 std::shared_ptr<VBucket>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr.h:93 15:05:40 #12 RespondAmbiguousNotification::~RespondAmbiguousNotification (this=<optimized out>, __in_chrg=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/kv_bucket.cc:226 15:05:40 #13 0x00007f550af30849 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7f550a41c340) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:154 15:05:40 #14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:684 15:05:40 #15 std::__shared_ptr<GlobalTask, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7ffe6ec46730, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr_base.h:1123 15:05:40 #16 std::shared_ptr<GlobalTask>::~shared_ptr (this=0x7ffe6ec46730, __in_chrg=<optimized out>) at /usr/local/include/c++/7.3.0/bits/shared_ptr.h:93 15:05:40 #17 ExecutorPool::_stopTaskGroup (this=<optimized out>, taskGID=140003322306560, taskType=<optimized out>, force=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/executorpool.cc:612 15:05:40 #18 0x00007f550af56cb8 in KVBucket::deinitialize (this=0x7f5510495000) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/kv_bucket.cc:466 15:05:40 #19 0x00007f550af09ee1 in EventuallyPersistentEngine::~EventuallyPersistentEngine (this=0x7f55104b2000, __in_chrg=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/ep_engine.cc:6073 15:05:40 #20 0x00007f550af0a0e7 in EventuallyPersistentEngine::~EventuallyPersistentEngine (this=0x7f55104b2000, __in_chrg=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/ep_engine.cc:6079 15:05:40 #21 EventuallyPersistentEngine::destroy (this=0x7f55104b2000, force=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/engines/ep/src/ep_engine.cc:155 15:05:40 #22 0x000000000040ff0b in MockTestHarness::destroy_bucket (force=false, handle=0x7f55104f2aa0, this=0x64c740 <harness>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/programs/engine_testapp/engine_testapp.cc:1178 15:05:40 #23 execute_test (default_cfg=<optimized out>, engine=<optimized out>, test=...) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/programs/engine_testapp/engine_testapp.cc:1333 15:05:40 #24 main (argc=<optimized out>, argv=<optimized out>) at /home/couchbase/jenkins/workspace/kv_engine-linux-master-CE/kv_engine/programs/engine_testapp/engine_testapp.cc:1581 Change-Id: I70298a8337967c648280b0d86a96c08bf3a4008a Reviewed-on: http://review.couchbase.org/110146 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jul 15, 2019
TSan found lock inversion as DcpProducer::setDisconnect holds `StreamContainer->rlock()` and then acquires `vb->getStateLock()` whereas `VBucket::set()` acquires them in the opposite order. Release the stream container lock before calling `Stream::setDead()` to avoid holding both in the `setDisconnect` path. TSan report: [ RUN ] DurabilityTest.MB34780 ================== WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=10424) Cycle in lock order graph: M4054 (0x7b68000308f8) => M201671426334441480 (0x000000000000) => M4054 Mutex M201671426334441480 acquired here while holding mutex M4054 in thread T7: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002953b) #1 cb_rw_reader_enter(pthread_rwlock_t*) .../platform/src/cb_pthreads.cc:195 (libplatform_so.so.0.1.0+0x000000009cfa) #2 cb::RWLock::readerLock() .../platform/include/platform/rwlock.h:87 (ep.so+0x0000000f4326) #3 cb::RWLock::lock_shared() .../platform/include/platform/rwlock.h:67 (ep.so+0x00000012f2ba) #4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) /usr/local/include/c++/7.3.0/shared_mutex:553 (ep.so+0x00000012f2ba) #5 StreamContainer<std::shared_ptr<Stream> >::ReadLockedHandle::ReadLockedHandle(...) .../kv_engine/engines/ep/src/dcp/stream_container.h:213 (ep.so+0x00000012f2ba) #6 StreamContainer<std::shared_ptr<Stream> >::rlock() const .../kv_engine/engines/ep/src/dcp/stream_container.h:273 (ep.so+0x000000122ad5) #7 DcpProducer::notifySeqnoAvailable(Vbid, unsigned long) .../kv_engine/engines/ep/src/dcp/producer.cc:1312 (ep.so+0x000000122ad5) #8 DcpConnMap::notifyVBConnections(Vbid, unsigned long) .../kv_engine/engines/ep/src/dcp/dcpconnmap.cc:424 (ep.so+0x0000000fa0af) #9 KVBucket::notifyReplication(Vbid, long) .../kv_engine/engines/ep/src/kv_bucket.cc:2570 (ep.so+0x00000020ed03) #10 EPBucket::notifyNewSeqno(Vbid, VBNotifyCtx const&) .../kv_engine/engines/ep/src/ep_bucket.cc:1327 (ep.so+0x000000160a95) #11 NotifyNewSeqnoCB::callback(Vbid const&, VBNotifyCtx const&) .../kv_engine/engines/ep/src/kv_bucket.h:837 (ep.so+0x000000224dcb) #12 VBucket::notifyNewSeqno(VBNotifyCtx const&) .../kv_engine/engines/ep/src/vbucket.cc:3579 (ep.so+0x000000262f6b) #13 VBucket::set(...) .../kv_engine/engines/ep/src/vbucket.cc:1569 (ep.so+0x00000026af3f) #14 KVBucket::set(...) .../kv_engine/engines/ep/src/kv_bucket.cc:692 (ep.so+0x00000021ee48) #15 EventuallyPersistentEngine::storeIfInner(...) .../kv_engine/engines/ep/src/ep_engine.cc:2440 (ep.so+0x00000018071f) Mutex M4054 previously acquired by the same thread here: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005b63d) #1 folly::detail::annotate_rwlock_acquired_impl(...) .../follytsan/folly/synchronization/SanitizeThread.cpp:91 (memcached+0x0000006463de) #2 annotate_rwlock_acquired .../build/tlm/deps/folly.exploded/include/folly/synchronization/SanitizeThread.h:99 (ep.so+0x00000021e932) #3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:696 (ep.so+0x00000021e932) #4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared(folly::SharedMutexToken&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:376 (ep.so+0x00000021e932) #5 folly::SharedMutexImpl<false, void, std::atomic, false, true>::ReadHolder::ReadHolder(folly::SharedMutexImpl<false, void, std::atomic, false, true> const&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:1315 (ep.so+0x00000021e932) #6 KVBucket::set(...) .../kv_engine/engines/ep/src/kv_bucket.cc:659 (ep.so+0x00000021e932) #7 EventuallyPersistentEngine::storeIfInner(...)> const&) .../kv_engine/engines/ep/src/ep_engine.cc:2440 (ep.so+0x00000018071f) Mutex M4054 acquired here while holding mutex M201671426334441480 in thread T5: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005b63d) #1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) .../follytsan/folly/synchronization/SanitizeThread.cpp:91 (memcached+0x0000006463de) #2 annotate_rwlock_acquired .../build/tlm/deps/folly.exploded/include/folly/synchronization/SanitizeThread.h:99 (ep.so+0x0000000bb5b6) #3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:696 (ep.so+0x0000000bb5b6) #4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared(folly::SharedMutexToken&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:376 (ep.so+0x0000000bb5b6) #5 folly::SharedMutexImpl<false, void, std::atomic, false, true>::ReadHolder::ReadHolder(...) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:1315 (ep.so+0x0000000bb5b6) #6 ActiveStream::setDead(end_stream_status_t) .../kv_engine/engines/ep/src/dcp/active_stream.cc:1181 (ep.so+0x0000000bb5b6) #7 operator() .../kv_engine/engines/ep/src/dcp/producer.cc:1491 (ep.so+0x0000001207df) #8 for_each<..., DcpProducer::setDisconnect()::<lambda(folly::AtomicHashMap<...>::value_type&)> > /usr/local/include/c++/7.3.0/bits/stl_algo.h:3884 (ep.so+0x0000001207df) #9 DcpProducer::setDisconnect() .../kv_engine/engines/ep/src/dcp/producer.cc:1493 (ep.so+0x000000120b08) Mutex M201671426334441480 previously acquired by the same thread here: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002953b) #1 cb_rw_reader_enter(pthread_rwlock_t*) .../platform/src/cb_pthreads.cc:195 (libplatform_so.so.0.1.0+0x000000009cfa) #2 cb::RWLock::readerLock() .../platform/include/platform/rwlock.h:87 (ep.so+0x0000001205e9) #3 cb::RWLock::lock_shared() .../platform/include/platform/rwlock.h:67 (ep.so+0x0000001205e9) #4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) /usr/local/include/c++/7.3.0/shared_mutex:553 (ep.so+0x0000001205e9) #5 StreamContainer<std::shared_ptr<Stream> >::ReadLockedHandle::ReadLockedHandle(StreamContainer<std::shared_ptr<Stream> > const&) .../kv_engine/engines/ep/src/dcp/stream_container.h:213 (ep.so+0x0000001205e9) #6 StreamContainer<std::shared_ptr<Stream> >::rlock() const .../kv_engine/engines/ep/src/dcp/stream_container.h:273 (ep.so+0x0000001205e9) #7 operator() .../kv_engine/engines/ep/src/dcp/producer.cc:1490 (ep.so+0x0000001205e9) #8 for_each<..., DcpProducer::setDisconnect()::<lambda(folly::AtomicHashMap<...>::value_type&)> > /usr/local/include/c++/7.3.0/bits/stl_algo.h:3884 (ep.so+0x0000001207df) #9 DcpProducer::setDisconnect() .../kv_engine/engines/ep/src/dcp/producer.cc:1493 (ep.so+0x000000120b08) Change-Id: Ibb2ae11498c7226514bc18788778878bd6c87363 Reviewed-on: http://review.couchbase.org/111905 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jul 17, 2019
TSan found lock inversion as DcpProducer::closeAllStreams() holds `StreamContainer->wlock()` and then acquires `vb->getStateLock()` whereas `VBucket::set()` acquires them in the opposite order. Release the stream container lock before calling `Stream::setDead()` to avoid holding both in the `closeAllStreams()` path. Also, preemptively apply the same change to `setStreamDeadStatus` though TSan has not identified inversion in this case. TSan report: [ RUN ] DurabilityTest.MB34780 ================== WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=16422) Cycle in lock order graph: M3987 (0x7b68000308f8) => M225878274331574312 (0x000000000000) => M3987 Mutex M225878274331574312 acquired here while holding mutex M3987 in thread T7: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002953b) #1 cb_rw_reader_enter(pthread_rwlock_t*) .../platform/src/cb_pthreads.cc:195 (libplatform_so.so.0.1.0+0x000000009cfa) #2 cb::RWLock::readerLock() .../platform/include/platform/rwlock.h:87 (ep.so+0x0000000f423a) #3 cb::RWLock::lock_shared() .../platform/include/platform/rwlock.h:67 (ep.so+0x00000012f578) #4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) /usr/local/include/c++/7.3.0/shared_mutex:553 (ep.so+0x00000012f578) #5 StreamContainer<std::shared_ptr<Stream> >::ReadLockedHandle::ReadLockedHandle(StreamContainer<std::shared_ptr<Stream> > const&) .../kv_engine/engines/ep/src/dcp/stream_container.h:213 (ep.so+0x00000012f578) #6 StreamContainer<std::shared_ptr<Stream> >::rlock() const .../kv_engine/engines/ep/src/dcp/stream_container.h:273 (ep.so+0x000000122ea7) #7 DcpProducer::notifySeqnoAvailable(Vbid, unsigned long) .../kv_engine/engines/ep/src/dcp/producer.cc:1312 (ep.so+0x000000122ea7) #8 DcpConnMap::notifyVBConnections(Vbid, unsigned long) .../kv_engine/engines/ep/src/dcp/dcpconnmap.cc:424 (ep.so+0x0000000fa071) #9 KVBucket::notifyReplication(Vbid, long) .../kv_engine/engines/ep/src/kv_bucket.cc:2570 (ep.so+0x000000210711) #10 EPBucket::notifyNewSeqno(Vbid, VBNotifyCtx const&) .../kv_engine/engines/ep/src/ep_bucket.cc:1327 (ep.so+0x00000016232b) #11 NotifyNewSeqnoCB::callback(Vbid const&, VBNotifyCtx const&) .../kv_engine/engines/ep/src/kv_bucket.h:837 (ep.so+0x0000002267d9) #12 VBucket::notifyNewSeqno(VBNotifyCtx const&) .../kv_engine/engines/ep/src/vbucket.cc:3631 (ep.so+0x000000264871) #13 VBucket::set() .../kv_engine/engines/ep/src/vbucket.cc:1568 (ep.so+0x00000026c768) #14 KVBucket::set() .../kv_engine/engines/ep/src/kv_bucket.cc:692 (ep.so+0x000000220856) #15 EventuallyPersistentEngine::storeIfInner() .../kv_engine/engines/ep/src/ep_engine.cc:2440 (ep.so+0x000000181fef) Mutex M3987 previously acquired by the same thread here: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005b63d) #1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) .../folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.cpp:91 (memcached+0x0000006463de) #2 annotate_rwlock_acquired .../build/tlm/deps/folly.exploded/include/folly/synchronization/SanitizeThread.h:99 (ep.so+0x000000220340) #3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:696 (ep.so+0x000000220340) #4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared(folly::SharedMutexToken&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:376 (ep.so+0x000000220340) #5 folly::SharedMutexImpl<false, void, std::atomic, false, true>::ReadHolder::ReadHolder(folly::SharedMutexImpl<false, void, std::atomic, false, true> const&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:1315 (ep.so+0x000000220340) #6 KVBucket::set() .../kv_engine/engines/ep/src/kv_bucket.cc:659 (ep.so+0x000000220340) #7 EventuallyPersistentEngine::storeIfInner() .../kv_engine/engines/ep/src/ep_engine.cc:2440 (ep.so+0x000000181fef) Mutex M3987 acquired here while holding mutex M225878274331574312 in thread T5: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005b63d) #1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) .../folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.cpp:91 (memcached+0x0000006463de) #2 annotate_rwlock_acquired .../build/tlm/deps/folly.exploded/include/folly/synchronization/SanitizeThread.h:99 (ep.so+0x0000000bb626) #3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:696 (ep.so+0x0000000bb626) #4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared(folly::SharedMutexToken&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:376 (ep.so+0x0000000bb626) #5 folly::SharedMutexImpl<false, void, std::atomic, false, true>::ReadHolder::ReadHolder(folly::SharedMutexImpl<false, void, std::atomic, false, true> const&) .../build/tlm/deps/folly.exploded/include/folly/SharedMutex.h:1315 (ep.so+0x0000000bb626) #6 ActiveStream::setDead(end_stream_status_t) .../kv_engine/engines/ep/src/dcp/active_stream.cc:1181 (ep.so+0x0000000bb626) #7 operator() .../kv_engine/engines/ep/src/dcp/producer.cc:1383 (ep.so+0x0000001257d1) #8 for_each<...> /usr/local/include/c++/7.3.0/bits/stl_algo.h:3884 (ep.so+0x0000001257d1) #9 DcpProducer::closeAllStreams() .../kv_engine/engines/ep/src/dcp/producer.cc:1377 (ep.so+0x000000125c00) Mutex M225878274331574312 previously acquired by the same thread here: #0 pthread_rwlock_wrlock <null> (libtsan.so.0+0x0000000297eb) #1 cb_rw_writer_enter(pthread_rwlock_t*) .../platform/src/cb_pthreads.cc:217 (libplatform_so.so.0.1.0+0x000000009e80) #2 cb::RWLock::writerLock() .../platform/include/platform/rwlock.h:103 (ep.so+0x000000125597) #3 cb::RWLock::lock() .../platform/include/platform/rwlock.h:77 (ep.so+0x000000125597) #4 std::unique_lock<cb::RWLock>::lock() /usr/local/include/c++/7.3.0/bits/std_mutex.h:267 (ep.so+0x000000125597) #5 std::unique_lock<cb::RWLock>::unique_lock(cb::RWLock&) /usr/local/include/c++/7.3.0/bits/std_mutex.h:197 (ep.so+0x000000125597) #6 StreamContainer<std::shared_ptr<Stream> >::WriteLockedHandle::WriteLockedHandle(StreamContainer<std::shared_ptr<Stream> >&) .../kv_engine/engines/ep/src/dcp/stream_container.h:237 (ep.so+0x000000125597) #7 StreamContainer<std::shared_ptr<Stream> >::wlock() .../kv_engine/engines/ep/src/dcp/stream_container.h:277 (ep.so+0x000000125597) #8 operator() .../kv_engine/engines/ep/src/dcp/producer.cc:1381 (ep.so+0x000000125597) #9 for_each<...> /usr/local/include/c++/7.3.0/bits/stl_algo.h:3884 (ep.so+0x000000125597) #10 DcpProducer::closeAllStreams() .../kv_engine/engines/ep/src/dcp/producer.cc:1377 (ep.so+0x000000125c00) Change-Id: Icc15e74e80d7f1926ce6c75bbdd8aa1c43f5ca2c Reviewed-on: http://review.couchbase.org/111989 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Dave Rigby <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Aug 8, 2019
Fix data race in Timings::get_or_create_timing_histogram() to prevent the data race were in one thread we might not observe correctly the pointer to a given Hdr1sfMicroSecHistogram stored by the timings array. Which could lead to the Hdr1sfMicroSecHistogram being allocated multiple times. WARNING: ThreadSanitizer: data race (pid=1085) Read of size 8 at 0x0000009bad18 by thread T7 (mutexes: write M2377): #0 std::__uniq_ptr_impl<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >::_M_ptr() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:147 (memcached+0x0000004bf317) #1 std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >::get() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:337 (memcached+0x0000004bf317) #2 std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >::operator bool() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:351 (memcached+0x0000004bf317) #3 bool std::operator==<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >(std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> > const&, decltype(nullptr)) /usr/local/include/c++/7.3.0/bits/unique_ptr.h:690 (memcached+0x0000004bf317) #4 Timings::get_or_create_timing_histogram(unsigned char) /home/couchbase/couchbase/kv_engine/daemon/timings.cc:146 (memcached+0x0000004bf317) #5 Timings::collect(cb::mcbp::ClientOpcode, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) /home/couchbase/couchbase/kv_engine/daemon/timings.cc:42 (memcached+0x0000004bf4aa) ... Previous write of size 8 at 0x0000009bad18 by thread T8 (mutexes: write M2371, write M2392): #0 std::enable_if<std::__and_<std::__not_<std::__is_tuple_like<Hdr1sfMicroSecHistogram*> >, std::is_move_constructible<Hdr1sfMicroSecHistogram*>, std::is_move_assignable<Hdr1sfMicroSecHistogram*> >::value, void>::type std::swap<Hdr1sfMicroSecHistogram*>(Hdr1sfMicroSecHistogram*&, Hdr1sfMicroSecHistogram*&) /usr/local/include/c++/7.3.0/bits/move.h:199 (memcached+0x0000004bf3cf) #1 std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >::reset(Hdr1sfMicroSecHistogram*) /usr/local/include/c++/7.3.0/bits/unique_ptr.h:374 (memcached+0x0000004bf3cf) #2 std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >::operator=(std::unique_ptr<Hdr1sfMicroSecHistogram, std::default_delete<Hdr1sfMicroSecHistogram> >&&) /usr/local/include/c++/7.3.0/bits/unique_ptr.h:283 (memcached+0x0000004bf3cf) #3 Timings::get_or_create_timing_histogram(unsigned char) /home/couchbase/couchbase/kv_engine/daemon/timings.cc:149 (memcached+0x0000004bf3cf) #4 Timings::collect(cb::mcbp::ClientOpcode, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) /home/couchbase/couchbase/kv_engine/daemon/timings.cc:42 (memcached+0x0000004bf4aa) ... Change-Id: I34db63854a1909cbf43a9feb273a13dfa40f313d Reviewed-on: http://review.couchbase.org/113024 Reviewed-by: James Harrison <[email protected]> Tested-by: Build Bot <[email protected]> Reviewed-by: Dave Rigby <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Aug 19, 2019
Change how SyncWrites which are Resolved and awaiting Completion are handled, by moving the final VBucket::commit() / abort() into a background task - DurabilityCompletionTask. +Background+ There are two reasons for making this change: a) Performance - specifically latency of front-end worker threads. By moving completion into a background task, we reduce the amount of work done on the thread which actually detected the SyncWrite was resolved - typically the front-end DCP threads when a DCP_SEQNO_ACK is processed. Given that we SEQNO_ACK at the end of Snapshot, A single SEQNO_ACK could result in committing multiple SyncWrites. Committing one SyncWrite is similar to a normal front-end Set operation, so there is potentially a non-trivial amount of work needed to be done when completing SyncWrites, which could tie up the front-end thread (causing other Connections to have to wait) for a noticable amount of time. b) Simplification of lock management. Doing completion in a background task simplifies lock management, for example we avoid lock inversions with earlier locks acquired during dcpSeqnoAck when attemping to later call notifySeqnoAvailable when this was done on the original thread. +Problem+ While (a) was the first reason identified for making this change (see MB-33092), (b) is the reason this change is being made now. During testing the following lock-order-inversion was seen: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) Cycle in lock order graph: Stream::streamMutex => StreamContainer::rwlock => Stream::streamMutex The crux of the issue is the processing of DCP_SEQNO_ACKNOWLEDGED messages by the DcpProducer - this acquires the Stream::streamMutex before calling VBucket::seqnoAcknowledged(), however that function currently results in VBucket::commit() being called to synchronously complete the SyncWrite; which in turn must nodify all connected replica that a new seqno is available, requiring StreamContainer::rwlock to be acquired: Mutex StreamContainer::rwlock acquired here while holding mutex Stream::streamMutex in thread T15: ... #6 StreamContainer<std::shared_ptr<Stream> >::rlock() #7 DcpProducer::notifySeqnoAvailable(Vbid, unsigned long) ... #13 VBucket::commit(...) #14 ActiveDurabilityMonitor::commit(...) #15 ActiveDurabilityMonitor::processCompletedSyncWriteQueue() #16 ActiveDurabilityMonitor::seqnoAckReceived(...) #17 VBucket::seqnoAcknowledged(...) #18 ActiveStream::seqnoAck(...) #19 DcpProducer::seqno_acknowledged(...) ... Mutex Stream::streamMutex previously acquired by the same thread here: ... #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) #4 ActiveStream::seqnoAck(...) #5 DcpProducer::seqno_acknowledged(...) ... This conflicts with the ordering seen when sending items out on the DCP connection - inside DcpProducer::step() where the StreamContainer::rwlock is acquired first, then ActiveStream::mutex acquired later: Mutex Stream::streamMutex acquired here while holding mutex StreamContainer::rwlock in thread T15: ... #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) #4 ActiveStream::next() #5 DcpProducer::getNextItem() #6 DcpProducer::step(dcp_message_producers*) ... Mutex StreamContainer::rwlock previously acquired by the same thread here: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002c98b) ... #4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) #5 StreamContainer<>::ResumableIterationHandle::ResumableIterationHandle() #6 StreamContainer<>::startResumable() #7 DcpProducer::getNextItem() #8 DcpProducer::step(dcp_message_producers*) ... +Solution+ The processing of resolved SyncWrites moved into a new background task. Instead of immediately processing them within ActiveDM::seqnoAckReceived(), that function notifies the new NonIO DurabilityCompletionTask that there are SyncWrites waiting for completion. DurabilityCompletionTask maintains a bool per vBucket indicating if there are SyncWrites for that vBucket pending completion. When the task is run, for each flag which is true it calls VBucket::processResolvedSyncWrites() for the associated VBucket. +Implementaiton Notes+ Currently there is just a single DurabilityCompletionTask (per Bucket), this was chosen as 1 task per vBucket (i.e. 1024 per Bucket) would be inefficient for our current background task scheduler (both in terms of latency to schedule each task for only one vBucket's worth of work, and in terms of managing that many tasks in the future queue). However, that does _potentially_ mean there's fewer resources (threads) available to complete SyncWrites on - previously that work could be done concurrently on all frontend threads (~O(num_cpus). Now the same work only has 1 thread available to run on (there's only a single DurabilityCompletionTask). _If_ this becomes a bottleneck we could look at increasing the number of DurabilityCompletionTask - e.g. sharding all vBuckets across multiple tasks like flusher / bgfetcher. Change-Id: I87897af1e3fd0a57e5abc2cb1ba9f795a9d3c63e Reviewed-on: http://review.couchbase.org/113141 Tested-by: Build Bot <[email protected]> Reviewed-by: Ben Huddleston <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Aug 21, 2019
The TSAN output (below) is saying that we have unlocked reads of the Connection::datatype 'bitset' from various threads, when the bitset was written with the frontend thread lock held. It is now common for non front-end threads to query a connection's permitted data-types, e.g. TSAN shows the backfill task deciding what to do about snappy. Changing the datatype to be std::atomic allows for correct write read between threads. ThreadSanitizer: data race/usr/local/include/c++/7.3.0/bitset:433std::_Base_bitset<1ul>::_M_do_or(std::_Base_bitset<1ul> const&) Write of size 8 at 0x7b5c00040f60 by thread T6 (mutexes: write M35842): #0 std::_Base_bitset<1ul>::_M_do_or(std::_Base_bitset<1ul> const&) /usr/local/include/c++/7.3.0/bitset:433 (memcached+0x0000004f1ec2) #1 std::bitset<8ul>::operator|=(std::bitset<8ul> const&) /usr/local/include/c++/7.3.0/bitset:973 (memcached+0x0000004f1ec2) #2 Datatype::enable(cb::mcbp::Feature) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/datatype.cc:57 (memcached+0x0000004f1ec2) #3 Connection::enableDatatype(cb::mcbp::Feature) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connection.h:653 (memcached+0x000000507ef6) #4 process_bin_dcp_response /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/mcbp_executors.cc:573 (memcached+0x000000507ef6) #5 std::_Function_handler::_M_invoke(std::_Any_data const&, Cookie&) /usr/local/include/c++/7.3.0/bits/std_function.h:316 (memcached+0x000000509c92) #6 std::function::operator()(Cookie&) const /usr/local/include/c++/7.3.0/bits/std_function.h:706 (memcached+0x000000508929) #7 execute_client_response_packet /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/mcbp_executors.cc:897 (memcached+0x000000508929) #8 execute_response_packet(Cookie&, cb::mcbp::Response const&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/mcbp_executors.cc:946 (memcached+0x000000508929) #9 Cookie::execute() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/cookie.cc:120 (memcached+0x0000004eb86e) #10 StateMachine::conn_execute() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/statemachine.cc:410 (memcached+0x00000053d4fc) #11 StateMachine::execute() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/statemachine.cc:137 (memcached+0x00000053f837) #12 Connection::runStateMachinery() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connection.cc:1340 (memcached+0x0000004d46d7) #13 Connection::runEventLoop(short) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connection.cc:1414 (memcached+0x0000004d476e) #14 run_event_loop(Connection*, short) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connections.cc:148 (memcached+0x0000004e9cbb) #15 event_handler(int, short, void*) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/memcached.cc:848 (memcached+0x00000043c50d) #16 event_persist_closure /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1580 (libevent_core.so.2.1.8+0x000000017086) #17 event_process_active_single_queue /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1639 (libevent_core.so.2.1.8+0x000000017086) #18 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #19 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #20 (libtsan.so.0+0x000000024feb) Previous read of size 8 at 0x7b5c00040f60 by thread T14 (mutexes: write M1073680921370941928): #0 std::bitset<8ul> std::operator&<8ul>(std::bitset<8ul> const&, std::bitset<8ul> const&) /usr/local/include/c++/7.3.0/bitset:1427 (memcached+0x0000004f1c7e) #1 Datatype::isEnabled(unsigned char) const /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/datatype.cc:101 (memcached+0x0000004f1c7e) #2 Connection::isDatatypeEnabled(unsigned char) const /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connection.h:679 (memcached+0x000000443b85) #3 ServerCookieApi::is_datatype_supported(gsl::not_null, unsigned char) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/memcached.cc:1543 (memcached+0x000000443b85) #4 EventuallyPersistentEngine::isDatatypeSupported(void const*, unsigned char) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/ep_engine.cc:1831 (ep.so+0x00000017861d) #5 DcpProducer::isCompressionEnabled() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/producer.h:175 (ep.so+0x0000000a8ab2) #6 ActiveStream::isCompressionEnabled() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/active_stream.cc:606 (ep.so+0x0000000a8ab2) #7 DCPBackfillDisk::create() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/backfill_disk.cc:208 (ep.so+0x0000000cb7e0) #8 DCPBackfillDisk::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/backfill_disk.cc:155 (ep.so+0x0000000ccb1f) #9 BackfillManager::backfill() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/backfill-manager.cc:313 (ep.so+0x0000000c9005) #10 BackfillManagerTask::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/dcp/backfill-manager.cc:74 (ep.so+0x0000000c9571) #11 ExecutorThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorthread.cc:153 (ep.so+0x0000001d5db8) #12 launch_executor_thread /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorthread.cc:34 (ep.so+0x0000001d6e95) #13 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #14 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #15 (libtsan.so.0+0x000000024feb) Location is heap block of size 880 at 0x7b5c00040c00 allocated by thread T6: #0 operator new(unsigned long) (libtsan.so.0+0x00000006a4d6) #1 allocate_connection /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connections.cc:256 (memcached+0x0000004ea61a) #2 conn_new(int, ListeningPort const&, event_base*, FrontEndThread&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/connections.cc:165 (memcached+0x0000004ea61a) #3 dispatch_new_connections /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/thread.cc:251 (memcached+0x0000004a3cb5) #4 thread_libevent_process /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/thread.cc:292 (memcached+0x0000004a3cb5) #5 event_persist_closure /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1580 (libevent_core.so.2.1.8+0x000000017086) #6 event_process_active_single_queue /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1639 (libevent_core.so.2.1.8+0x000000017086) #7 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #8 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #9 (libtsan.so.0+0x000000024feb) Mutex M35842 (0x7b780004f8d8) created at: #0 pthread_mutex_lock (libtsan.so.0+0x00000003876f) #1 __gthread_mutex_lock /usr/local/include/c++/7.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748 (memcached+0x0000004a4371) #2 std::mutex::lock() /usr/local/include/c++/7.3.0/bits/std_mutex.h:103 (memcached+0x0000004a4371) #3 phosphor::MutexEventGuard::MutexEventGuard(phosphor::tracepoint_info const*, phosphor::tracepoint_info const*, bool, std::mutex&, std::chrono::duration >) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/phosphor/include/phosphor/scoped_event_guard.h:93 (memcached+0x0000004a4371) #4 thread_libevent_process /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/thread.cc:300 (memcached+0x0000004a4371) #5 event_persist_closure /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1580 (libevent_core.so.2.1.8+0x000000017086) #6 event_process_active_single_queue /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1639 (libevent_core.so.2.1.8+0x000000017086) #7 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #8 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #9 (libtsan.so.0+0x000000024feb) Mutex M1073680921370941928 is already destroyed. Thread T6 'mc:worker_0' (tid=6945, running) created by main thread at: #0 pthread_create (libtsan.so.0+0x0000000282a0) #1 cb_create_named_thread(unsigned long*, void (*)(void*), void*, int, char const*) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:109 (libplatform_so.so.0.1.0+0x00000000995b) #2 create_worker /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/thread.cc:111 (memcached+0x0000004a510d) #3 thread_init(unsigned long, event_base*, void (*)(int, short, void*)) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/thread.cc:460 (memcached+0x0000004a510d) #4 memcached_main /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/memcached.cc:2457 (memcached+0x00000043ebdf) #5 main /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/main.cc:33 (memcached+0x00000042a1ee) Thread T14 'mc:auxIO_0' (tid=7362, running) created by thread T21 at: #0 pthread_create (libtsan.so.0+0x0000000282a0) #1 cb_create_named_thread(unsigned long*, void (*)(void*), void*, int, char const*) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:109 (libplatform_so.so.0.1.0+0x00000000995b) #2 ExecutorThread::start() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorthread.cc:51 (ep.so+0x0000001d54b2) #3 ExecutorPool::_adjustWorkers(task_type_t, unsigned long) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorpool.cc:527 (ep.so+0x0000001cb9a2) #4 ExecutorPool::_startWorkers() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorpool.cc:597 (ep.so+0x0000001cc535) #5 ExecutorPool::_registerTaskable(Taskable&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorpool.cc:483 (ep.so+0x0000001cb1f8) #6 ExecutorPool::registerTaskable(Taskable&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/executorpool.cc:488 (ep.so+0x0000001cb55f) #7 KVBucket::KVBucket(EventuallyPersistentEngine&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/kv_bucket.cc:315 (ep.so+0x000000214248) #8 EPBucket::EPBucket(EventuallyPersistentEngine&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/ep_bucket.cc:251 (ep.so+0x00000015c99b) #9 std::_MakeUniq::__single_object std::make_unique(EventuallyPersistentEngine&) /usr/local/include/c++/7.3.0/bits/unique_ptr.h:825 (ep.so+0x00000017bf75) #10 EventuallyPersistentEngine::makeBucket(Configuration&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/ep_engine.cc:6100 (ep.so+0x00000017bf75) #11 EventuallyPersistentEngine::initialize(char const*) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/engines/ep/src/ep_engine.cc:2033 (ep.so+0x000000198dd0) #12 CreateBucketThread::create() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/memcached.cc:1867 (memcached+0x0000004361e2) #13 CreateBucketThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/kv_engine/daemon/memcached.cc:1910 (memcached+0x000000436a66) #14 Couchbase::Thread::thread_entry() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/thread.cc:45 (libplatform_so.so.0.1.0+0x00000001cb0a) #15 Couchbase::StartThreadDelegator::run(Couchbase::Thread&) /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/thread.cc:59 (libplatform_so.so.0.1.0+0x00000001cc05) #16 task_thread_main /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/thread.cc:65 (libplatform_so.so.0.1.0+0x00000001cc05) #17 CouchbaseThread::run() /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #18 platform_thread_wrap /home/couchbase/jenkins/workspace/kv_engine-master-post-commit-TSan/platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #19 (libtsan.so.0+0x000000024feb) Change-Id: I5dce6961174eaa55d092136b328f5252add0d073 Reviewed-on: http://review.couchbase.org/113508 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Aug 22, 2019
When building on macOS with clang 8 and ThreadSanitizer enabled, memcached_testapp crashes when destructing the Settings object: libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument With the following backtrace from where the system_error exception is thrown: * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x00007fff7b6411f6 libc++abi.dylib` __cxa_throw frame #1: 0x00007fff7b6147af libc++.1.dylib` std::__1::__throw_system_error(int, char const*) + 77 frame #2: 0x00007fff7b606c93 libc++.1.dylib` std::__1::mutex::lock() + 29 frame #3: 0x00000001009c27ff memcached_testapp` folly::SharedMutexImpl<...>::annotateLazyCreate() [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock(__m=<unavailable>) + 191 at __mutex_base:131 frame #4: 0x00000001009c27f7 memcached_testapp` folly::SharedMutexImpl<...>::annotateLazyCreate() [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock(__m=<unavailable>) at __mutex_base:131 * frame #5: 0x00000001009c27f7 memcached_testapp` folly::SharedMutexImpl<...>::annotateLazyCreate() + 140 at SharedMutex.cpp:33 frame #6: 0x00000001009c276b memcached_testapp` folly::SharedMutexImpl<...>::annotateLazyCreate(this=0x0000000100c7ddc8) + 43 at SharedMutex.h:705 frame #7: 0x00000001009c191a memcached_testapp` folly::SharedMutexImpl<...>::~SharedMutexImpl() [inlined] folly::SharedMutexImpl<...>::annotateDestroy(this=<unavailable>) + 8 at SharedMutex.h:718 frame #8: 0x00000001009c1912 memcached_testapp` folly::SharedMutexImpl<...>::~SharedMutexImpl() + 24 at SharedMutex.h:324 frame #9: 0x00000001009c18fa memcached_testapp` folly::SharedMutexImpl<...>::~SharedMutexImpl(this=0x0000000100c7ddc8) + 26 at SharedMutex.h:300 frame #10: 0x000000010076b837 memcached_testapp` folly::Synchronized<...>::~Synchronized(this=0x0000000100c7ddb0) + 55 at Synchronized.h:484 frame #11: 0x000000010075d1d9 memcached_testapp` folly::Synchronized<...>::~Synchronized(this=0x0000000100c7ddb0) + 41 at Synchronized.h:484 frame #12: 0x000000010075e8fc memcached_testapp` Settings::~Settings(this=0x0000000100c7db10) + 76 at settings.h:51 frame #13: 0x000000010075c8b9 memcached_testapp` Settings::~Settings(this=0x0000000100c7db10) + 41 at settings.h:51 frame #14: 0x0000000102b7d5c1 libclang_rt.tsan_osx_dynamic.dylib` cxa_at_exit_wrapper(void*) + 33 frame #15: 0x00007fff7d72beed libsystem_c.dylib` __cxa_finalize_ranges + 351 frame #16: 0x00007fff7d72c1fe libsystem_c.dylib` exit + 55 frame #17: 0x00007fff7d67f01c libdyld.dylib` start + 8 The problem is at frame 5, where as part of the destruction of SharedMutex (as used by a member variable of Settings) we are attempting to acquire a mutex which has already been destructed (as it is itself a function-local static) - i.e. we have encountered the static initialization order fiasco :( Address this by changing `settings` to no longer be a plain static (global) variable, and instead be created on first use via a new Settings::instance() static method. Change-Id: I40bd44ed0ae32eb55271ce739fdf990d9869c32f Reviewed-on: http://review.couchbase.org/113640 Tested-by: Build Bot <[email protected]> Reviewed-by: Jim Walker <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Aug 27, 2019
[[Re-apply after fixing error in DurabilityCompletionTask::run (skipping last vBucket).]] Change how SyncWrites which are Resolved and awaiting Completion are handled, by moving the final VBucket::commit() / abort() into a background task - DurabilityCompletionTask. +Background+ There are two reasons for making this change: a) Performance - specifically latency of front-end worker threads. By moving completion into a background task, we reduce the amount of work done on the thread which actually detected the SyncWrite was resolved - typically the front-end DCP threads when a DCP_SEQNO_ACK is processed. Given that we SEQNO_ACK at the end of Snapshot, A single SEQNO_ACK could result in committing multiple SyncWrites. Committing one SyncWrite is similar to a normal front-end Set operation, so there is potentially a non-trivial amount of work needed to be done when completing SyncWrites, which could tie up the front-end thread (causing other Connections to have to wait) for a noticable amount of time. b) Simplification of lock management. Doing completion in a background task simplifies lock management, for example we avoid lock inversions with earlier locks acquired during dcpSeqnoAck when attemping to later call notifySeqnoAvailable when this was done on the original thread. +Problem+ While (a) was the first reason identified for making this change (see MB-33092), (b) is the reason this change is being made now. During testing the following lock-order-inversion was seen: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) Cycle in lock order graph: Stream::streamMutex => StreamContainer::rwlock => Stream::streamMutex The crux of the issue is the processing of DCP_SEQNO_ACKNOWLEDGED messages by the DcpProducer - this acquires the Stream::streamMutex before calling VBucket::seqnoAcknowledged(), however that function currently results in VBucket::commit() being called to synchronously complete the SyncWrite; which in turn must nodify all connected replica that a new seqno is available, requiring StreamContainer::rwlock to be acquired: Mutex StreamContainer::rwlock acquired here while holding mutex Stream::streamMutex in thread T15: ... #6 StreamContainer<std::shared_ptr<Stream> >::rlock() #7 DcpProducer::notifySeqnoAvailable(Vbid, unsigned long) ... #13 VBucket::commit(...) #14 ActiveDurabilityMonitor::commit(...) #15 ActiveDurabilityMonitor::processCompletedSyncWriteQueue() #16 ActiveDurabilityMonitor::seqnoAckReceived(...) #17 VBucket::seqnoAcknowledged(...) #18 ActiveStream::seqnoAck(...) #19 DcpProducer::seqno_acknowledged(...) ... Mutex Stream::streamMutex previously acquired by the same thread here: ... #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) #4 ActiveStream::seqnoAck(...) #5 DcpProducer::seqno_acknowledged(...) ... This conflicts with the ordering seen when sending items out on the DCP connection - inside DcpProducer::step() where the StreamContainer::rwlock is acquired first, then ActiveStream::mutex acquired later: Mutex Stream::streamMutex acquired here while holding mutex StreamContainer::rwlock in thread T15: ... #3 std::lock_guard<std::mutex>::lock_guard(std::mutex&) #4 ActiveStream::next() #5 DcpProducer::getNextItem() #6 DcpProducer::step(dcp_message_producers*) ... Mutex StreamContainer::rwlock previously acquired by the same thread here: #0 pthread_rwlock_rdlock <null> (libtsan.so.0+0x00000002c98b) ... #4 std::shared_lock<cb::RWLock>::shared_lock(cb::RWLock&) #5 StreamContainer<>::ResumableIterationHandle::ResumableIterationHandle() #6 StreamContainer<>::startResumable() #7 DcpProducer::getNextItem() #8 DcpProducer::step(dcp_message_producers*) ... +Solution+ The processing of resolved SyncWrites moved into a new background task. Instead of immediately processing them within ActiveDM::seqnoAckReceived(), that function notifies the new NonIO DurabilityCompletionTask that there are SyncWrites waiting for completion. DurabilityCompletionTask maintains a bool per vBucket indicating if there are SyncWrites for that vBucket pending completion. When the task is run, for each flag which is true it calls VBucket::processResolvedSyncWrites() for the associated VBucket. +Implementation Notes+ Currently there is just a single DurabilityCompletionTask (per Bucket), this was chosen as 1 task per vBucket (i.e. 1024 per Bucket) would be inefficient for our current background task scheduler (both in terms of latency to schedule each task for only one vBucket's worth of work, and in terms of managing that many tasks in the future queue). However, that does _potentially_ mean there's fewer resources (threads) available to complete SyncWrites on - previously that work could be done concurrently on all frontend threads (~O(num_cpus). Now the same work only has 1 thread available to run on (there's only a single DurabilityCompletionTask). _If_ this becomes a bottleneck we could look at increasing the number of DurabilityCompletionTask - e.g. sharding all vBuckets across multiple tasks like flusher / bgfetcher. Change-Id: I33ecfa78b03b4d2120b5d05f54984b24ce038fd8 Reviewed-on: http://review.couchbase.org/113749 Reviewed-by: Ben Huddleston <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Sep 18, 2019
The audit daemon handle (auditHandle) does not have any synchronization around it, resulting in TSan warnings during shutdown as the Audit object may have been deleted: ThreadSanitizer: data race Write of size 8 at 0x0000009b0130 by main thread: #0 std::swap(cb::audit::Audit*&, cb::audit::Audit*&) /usr/local/include/c++/7.3.0/bits/move.h:199 (memcached+0x0000004f335a) #1 std::unique_ptr >::reset(cb::audit::Audit*) /usr/local/include/c++/7.3.0/bits/unique_ptr.h:374 (memcached+0x0000004f335a) #2 shutdown_audit() kv_engine/daemon/mcaudit.cc:348 (memcached+0x0000004f335a) #3 memcached_main kv_engine/daemon/memcached.cc:2503 (memcached+0x000000435a46) #4 main kv_engine/daemon/main.cc:33 (memcached+0x00000042145e) Previous read of size 8 at 0x0000009b0130 by thread T8 (mutexes: write M1049192993527234104): #0 std::__uniq_ptr_impl >::_M_ptr() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:147 (memcached+0x0000004f3433) #1 std::unique_ptr >::get() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:337 (memcached+0x0000004f3433) #2 std::unique_ptr >::operator->() const /usr/local/include/c++/7.3.0/bits/unique_ptr.h:331 (memcached+0x0000004f3433) #3 stats_audit(std::function)> const&, Cookie&) kv_engine/daemon/mcaudit.cc:361 (memcached+0x0000004f3433) #4 stat_audit_executor kv_engine/daemon/protocol/mcbp/stats_context.cc:446 (memcached+0x000000528f72) Location is global 'auditHandle' of size 8 at 0x0000009b0130 (memcached+0x0000009b0130) Fix by using folly::Synchronized<> for the auditHandle. Note that exclusive access is only needed during initialization & shutdown, so there should be no additional contention for actually adding audit events (where shared access is sufficient). Change-Id: I34966f08674c6363fde4592b5c4bede48747fb2f Reviewed-on: http://review.couchbase.org/114945 Tested-by: Build Bot <[email protected]> Reviewed-by: Richard de Mellow <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Sep 18, 2019
As reported by TSan, Connection::priority is read and written without a lock: hreadSanitizer: data race third_party/gsl-lite/include/gsl/gsl-lite.h:472gsl::fail_fast_assert(bool, char const*) Read of size 1 at 0x7b5c00019738 by thread T23 (mutexes: write M2903806): #0 gsl::fail_fast_assert(bool, char const*) third_party/gsl-lite/include/gsl/gsl-lite.h:472 (memcached+0x000000465448) #1 gsl::not_null::get() const third_party/gsl-lite/include/gsl/gsl-lite.h:888 (memcached+0x000000465448) #2 ServerCookieApi::get_priority(gsl::not_null) kv_engine/daemon/memcached.cc:1640 (memcached+0x000000465448) #3 EventuallyPersistentEngine::getDCPPriority(void const*) kv_engine/engines/ep/src/ep_engine.cc:5828 (ep.so+0x000000171c57) #4 ConnHandler::addStats(std::function)> const&, void const*) kv_engine/engines/ep/src/connhandler.cc:327 (ep.so+0x000000092b33) #5 DcpProducer::addStats(std::function)> const&, void const*) kv_engine/engines/ep/src/dcp/producer.cc:1252 (ep.so+0x000000119b1e) #6 ConnStatBuilder::operator()(std::shared_ptr) kv_engine/engines/ep/src/ep_engine.cc:3571 (ep.so+0x000000191fbf) #7 void DcpConnMap::each(ConnStatBuilder) kv_engine/engines/ep/src/dcp/dcpconnmap.h:164 (ep.so+0x000000191fbf) #8 EventuallyPersistentEngine::doDcpStats(void const*, std::function)> const&, cb::const_char_buffer) kv_engine/engines/ep/src/ep_engine.cc:3728 (ep.so+0x000000191fbf) #9 EventuallyPersistentEngine::getStats(void const*, cb::const_char_buffer, cb::const_char_buffer, std::function)> const&) kv_engine/engines/ep/src/ep_engine.cc:4398 (ep.so+0x000000192bb8) #10 KVBucket::snapshotStats() kv_engine/engines/ep/src/kv_bucket.cc:1149 (ep.so+0x0000002116db) #11 StatSnap::run() kv_engine/engines/ep/src/tasks.cc:72 (ep.so+0x000000258ae2) #12 ExecutorThread::run() kv_engine/engines/ep/src/executorthread.cc:153 (ep.so+0x0000001d0405) ... Previous write of size 1 at 0x7b5c00019738 by thread T7 (mutexes: write M1036245144598543240): #0 Connection::setPriority(Connection::Priority) kv_engine/daemon/connection.cc:1593 (memcached+0x0000004be2f2) #1 ServerCookieApi::set_priority(gsl::not_null, CONN_PRIORITY) kv_engine/daemon/memcached.cc:1618 (memcached+0x0000004658b9) #2 EventuallyPersistentEngine::setDCPPriority(void const*, CONN_PRIORITY) kv_engine/engines/ep/src/ep_engine.cc:5835 (ep.so+0x000000171d5f) #3 DcpProducer::control(unsigned int, cb::const_char_buffer, cb::const_char_buffer) kv_engine/engines/ep/src/dcp/producer.cc:945 (ep.so+0x000000118cc3) #4 non-virtual thunk to EventuallyPersistentEngine::control(gsl::not_null, unsigned int, cb::const_char_buffer, cb::const_char_buffer) (ep.so+0x00000018f966) #5 dcpControl(Cookie&, unsigned int, cb::const_char_buffer, cb::const_char_buffer) kv_engine/daemon/protocol/mcbp/engine_wrapper.cc:408 (memcached+0x00000047c981) #6 dcp_control_executor(Cookie&) kv_engine/daemon/protocol/mcbp/dcp_control_executor.cc:38 (memcached+0x000000513421) #7 std::_Function_handler::_M_invoke(std::_Any_data const&, Cookie&) /usr/local/include/c++/7.3.0/bits/std_function.h:316 (memcached+0x000000503202) #8 std::function::operator()(Cookie&) const /usr/local/include/c++/7.3.0/bits/std_function.h:706 (memcached+0x0000005019fe) #9 execute_client_request_packet(Cookie&, cb::mcbp::Request const&) kv_engine/daemon/mcbp_executors.cc:857 (memcached+0x0000005019fe) #10 execute_request_packet(Cookie&, cb::mcbp::Request const&) kv_engine/daemon/mcbp_executors.cc:881 (memcached+0x000000501bd7) ... Fix by changing Connection::priority to be an atomic. Change-Id: Ia6bae64ec09e1963e55e7cdb614fb17a68ff1726 Reviewed-on: http://review.couchbase.org/114956 Reviewed-by: Trond Norbye <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Sep 18, 2019
As reported by TSan, VBucket::purge_seqno is read and written without a lock: Write of size 8 at 0x7b6800107728 by thread T41: #0 Monotonic::operator=(unsigned long const&) kv_engine/engines/ep/src/monotonic.h:134 (ep.so+0x0000001be09e) #1 VBucket::setPurgeSeqno(unsigned long) kv_engine/engines/ep/src/vbucket.h:218 (ep.so+0x0000001be09e) #2 EphemeralVBucket::purgeStaleItems(std::function) kv_engine/engines/ep/src/ephemeral_vb.cc:350 (ep.so+0x0000001be09e) #3 EphemeralVBucket::StaleItemDeleter::visit(VBucket&) kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:205 (ep.so+0x0000001b7571) #4 KVBucket::pauseResumeVisit(PauseResumeVBVisitor&, KVBucketIface::Position&) kv_engine/engines/ep/src/kv_bucket.cc:2278 (ep.so+0x000000209a09) #5 EphTombstoneStaleItemDeleter::run() kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:273 (ep.so+0x0000001b6d59) #6 ExecutorThread::run() kv_engine/engines/ep/src/executorthread.cc:153 (ep.so+0x0000001d0405) #7 launch_executor_thread kv_engine/engines/ep/src/executorthread.cc:34 (ep.so+0x0000001d1345) #8 CouchbaseThread::run() platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000009c5f) #9 platform_thread_wrap platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000009c5f) #10 (libtsan.so.0+0x000000024feb) Previous read of size 8 at 0x7b6800107728 by thread T6 (mutexes: write M1057918717805263064): #0 VBucket::_addStats(bool, std::function)> const&, void const*) kv_engine/engines/ep/src/vbucket.cc:3010 (ep.so+0x0000002796b6) #1 EphemeralVBucket::addStats(bool, std::function)> const&, void const*) kv_engine/engines/ep/src/ephemeral_vb.cc:166 (ep.so+0x0000001b8824) #2 addVBStats kv_engine/engines/ep/src/ep_engine.cc:3155 (ep.so+0x000000183e93) #3 visitBucket kv_engine/engines/ep/src/ep_engine.cc:3127 (ep.so+0x000000183e93) #4 KVBucket::visit(VBucketVisitor&) kv_engine/engines/ep/src/kv_bucket.cc:2254 (ep.so+0x000000209efd) #5 EventuallyPersistentEngine::doVBucketStats(void const*, std::function)> const&, char const*, int, bool, bool) kv_engine/engines/ep/src/ep_engine.cc:3193 (ep.so+0x000000183a7b) ... Fix by changing to an Atomic WeaklyMonotonic type. Change-Id: I014242268f913d31fdc0964c42f59aa952607ba4 Reviewed-on: http://review.couchbase.org/114950 Reviewed-by: James Harrison <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Aug 7, 2020
As identified by TSan when adding additional ExecutorPool tests, the two flags isLowPrioQSet & isHiPrioQSet are accessed without locks from different threads: WARNING: ThreadSanitizer: data race (pid=25602) Write of size 1 at 0x7b5000005ad8 by main thread (mutexes: write M1128287290183932680): #0 CB3ExecutorPool::_registerTaskable(Taskable&) cb3_executorpool.cc:442 (ep-engine_ep_unit_tests+0x00000048891c) #1 CB3ExecutorPool::registerTaskable(Taskable&) cb3_executorpool.cc:454 (ep-engine_ep_unit_tests+0x000000488a16) #2 ExecutorPoolTest_cancel_can_schedule_Test<TestExecutorPool>::TestBody() executorpool_test.cc:530 (ep-engine_ep_unit_tests+0x0000012e207c) ... Previous read of size 1 at 0x7b5000005ad8 by thread T38: #0 CB3ExecutorPool::getSleepQ(unsigned int) cb3_executorpool.h:112 (ep-engine_ep_unit_tests+0x000000485421) #1 CB3ExecutorPool::_nextTask(CB3ExecutorThread&, unsigned char) cb3_executorpool.cc:148 (ep-engine_ep_unit_tests+0x000000485421) #2 CB3ExecutorPool::nextTask(CB3ExecutorThread&, unsigned char) cb3_executorpool.cc:163 (ep-engine_ep_unit_tests+0x0000004854e7) #3 CB3ExecutorThread::run() cb3_executorthread.cc:129 (ep-engine_ep_unit_tests+0x00000049cf7d) #4 launch_executor_thread cb3_executorthread.cc:34 (ep-engine_ep_unit_tests+0x00000049d0ae) #5 CouchbaseThread::run() ../platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000010e1b) #6 platform_thread_wrap ../platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000010e1b) This appears to be a latent issue in the CB3ExecutorPool, only exposed when different priority Taskables (Buckets) are registered. Fix by making the flags atomic. Change-Id: Ibf7fda1443bad32a36914a6d1bc7b3424a3bfe75 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/133889 Tested-by: Build Bot <[email protected]> Reviewed-by: James Harrison <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Sep 4, 2020
The following data race was observed on Stream member variables: Running [0056/0099]: test full rollback on consumer...================== WARNING: ThreadSanitizer: data race (pid=43827) Write of size 8 at 0x7b540001ef60 by thread T23: #0 PassiveStream::reconnectStream(...) ../kv_engine/engines/ep/src/dcp/passive_stream.cc:240 (libep.so+0x00000012eed6) #1 DcpConsumer::doRollback(unsigned int, Vbid, unsigned long) ../kv_engine/engines/ep/src/dcp/consumer.cc:1107 (libep.so+0x0000001116d3) #2 RollbackTask::run() ../kv_engine/engines/ep/src/dcp/consumer.cc:938 (libep.so+0x000000111871) #3 GlobalTask::execute() ../kv_engine/engines/ep/src/globaltask.cc:73 (libep.so+0x0000002363cb) #4 CB3ExecutorThread::run() ../kv_engine/engines/ep/src/cb3_executorthread.cc:174 (libep.so+0x00000009f7a1) #5 launch_executor_thread ../kv_engine/engines/ep/src/cb3_executorthread.cc:34 (libep.so+0x00000009fded) #6 CouchbaseThread::run() ../platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000010e1b) #7 platform_thread_wrap ../platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000010e1b) #8 <null> <null> (libtsan.so.0+0x000000024feb) Previous read of size 8 at 0x7b540001ef60 by thread T22 (mutexes: write M639646524455782464): #0 void StatCollector::addStat<...>(...) ../kv_engine/include/statistics/collector.h:343 (libep.so+0x0000000e5e03) #1 void add_casted_stat<>(...) ../kv_engine/include/statistics/collector.h:404 (libep.so+0x0000000e5e03) #2 Stream::addStats(...) ../kv_engine/engines/ep/src/dcp/stream.cc:157 (libep.so+0x00000016b1b4) #3 PassiveStream::addStats(...) ../kv_engine/engines/ep/src/dcp/passive_stream.cc:1058 (libep.so+0x000000133a98) #4 DcpConsumer::addStats(...) ../kv_engine/engines/ep/src/dcp/consumer.cc:1140 (libep.so+0x000000115415) #5 ConnStatBuilder::operator()(std::shared_ptr<ConnHandler>) ../kv_engine/engines/ep/src/ep_engine.cc:3784 (libep.so+0x0000001f3f9a) #6 void DcpConnMap::each<ConnStatBuilder&>(ConnStatBuilder&) ../kv_engine/engines/ep/src/dcp/dcpconnmap_impl.h:33 (libep.so+0x0000001f3f9a) #7 EventuallyPersistentEngine::doDcpStats(...) ../kv_engine/engines/ep/src/ep_engine.cc:3943 (libep.so+0x0000001d9cd0) #8 EventuallyPersistentEngine::getStats(...) ../kv_engine/engines/ep/src/ep_engine.cc:4675 (libep.so+0x0000001dd67b) #9 KVBucket::snapshotStats() ../kv_engine/engines/ep/src/kv_bucket.cc:1168 (libep.so+0x000000261e32) #10 StatSnap::run() ../kv_engine/engines/ep/src/tasks.cc:72 (libep.so+0x0000002b76a9) #11 GlobalTask::execute() ../kv_engine/engines/ep/src/globaltask.cc:73 (libep.so+0x0000002363cb) #12 CB3ExecutorThread::run() ../kv_engine/engines/ep/src/cb3_executorthread.cc:174 (libep.so+0x00000009f7a1) #13 launch_executor_thread ../kv_engine/engines/ep/src/cb3_executorthread.cc:34 (libep.so+0x00000009fded) #14 CouchbaseThread::run() ../platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x000000010e1b) #15 platform_thread_wrap ../platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x000000010e1b) #16 <null> <null> (libtsan.so.0+0x000000024feb) Fix by acquiring the stream mutex before accessing these. Change-Id: Ie14eb02e8e98c0aa5c9432c6f385766a215eca8f Reviewed-on: http://review.couchbase.org/c/kv_engine/+/135531 Tested-by: Build Bot <[email protected]> Reviewed-by: James Harrison <[email protected]> Reviewed-by: Paolo Cocchi <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 6, 2020
Move ThreadGate so it has a scope equal to the thread ==16055==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffceabf4548 at pc 0x00000520d834 bp 0x7fb4f0458520 sp 0x7fb4f0458518 READ of size 8 at 0x7ffceabf4548 thread T3 #0 0x520d833 in ThreadGate::isComplete(std::unique_lock<std::mutex> const&) ../kv_engine/engines/ep/tests/module_tests/thread_gate.h:69 #1 0x520d833 in ThreadGate::threadUp()::{lambda()#1}::operator()() const ../kv_engine/engines/ep/tests/module_tests/thread_gate.h:45 #2 0x520d833 in void std::condition_variable::wait<ThreadGate::threadUp()::{lambda()#1}>(std::unique_lock<std::mutex>&, ThreadGate::threadUp()::{lambda()#1}) /usr/local/include/c++/7.3.0/condition_variable:98 #3 0x520d833 in ThreadGate::threadUp() ../kv_engine/engines/ep/tests/module_tests/thread_gate.h:45 #4 0x520d833 in resetThread(HdrHistogram&, ThreadGate&) ../kv_engine/engines/ep/tests/module_tests/hdrhistogram_test.cc:482 #5 0x5228687 in void std::__invoke_impl<void, void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&>(std::__invoke_other, void (*&&)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&) /usr/local/include/c++/7.3.0/bits/invoke.h:60 #6 0x5228687 in std::__invoke_result<void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&>::type std::__invoke<void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&>(void (*&&)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&) /usr/local/include/c++/7.3.0/bits/invoke.h:95 #7 0x5228687 in decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)(), (_S_declval<2ul>)())) std::thread::_Invoker<std::tuple<void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/local/include/c++/7.3.0/thread:234 #8 0x5228687 in std::thread::_Invoker<std::tuple<void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&> >::operator()() /usr/local/include/c++/7.3.0/thread:243 #9 0x5228687 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(HdrHistogram&, ThreadGate&), Hdr2sfMicroSecHistogram&, ThreadGate&> > >::_M_run() /usr/local/include/c++/7.3.0/thread:186 #10 0x7fb4f66b295e in execute_native_thread_routine /tmp/deploy/objdir/../gcc-7.3.0/libstdc++-v3/src/c++11/thread.cc:83 #11 0x7fb4f51a76b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) #12 0x7fb4f4edd41c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c) Change-Id: Idb2dbb73d1a2e0b1beca570b38401009d3408906 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/137540 Reviewed-by: Richard de Mellow <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 7, 2020
Fix a TSAN failure due to a lock order inversion observed on the mad-hatter TSAN cv machine. This was due to a potential deadlock while acquiring locks on this and other histogram in HdrHistogram::operator+=(). To fix this use folly::acquireLocked() to safely take write lock and read lock on this->histogram and other.histogram respectively. We also remove the nullptr for this->histogram as we don't guard any other access to the histograms pointer, in case cb_calloc() fails. TSAN failure: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=11208) Cycle in lock order graph: M296533874179548048 (0x000000000000) => M174936684240544688 (0x000000000000) => M296533874179548048 Mutex M174936684240544688 acquired here while holding mutex M296533874179548048 in main thread: #0 AnnotateRWLockAcquired <null> (libtsan.so.0+0x00000005b63d) #1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.cpp:99 (ep-engine_ep_unit_tests+0x00000139555e) #2 annotate_rwlock_acquired tlm/deps/folly.exploded/include/folly/synchronization/SanitizeThread.h:111 (ep-engine_ep_unit_tests+0x000001362a06) #3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) tlm/deps/folly.exploded/include/folly/SharedMutex.h:726 (ep-engine_ep_unit_tests+0x000001362a06) #4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared() tlm/deps/folly.exploded/include/folly/SharedMutex.h:400 (ep-engine_ep_unit_tests+0x000001362a06) #5 folly::detail::LockTraitsImpl<folly::SharedMutexImpl<false, void, std::atomic, false, true>, (folly::detail::MutexLevel)1, false>::lock_shared(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) tlm/deps/folly.exploded/include/folly/LockTraits.h:157 (ep-engine_ep_unit_tests+0x000001362a06) #6 std::integral_constant<bool, true> folly::LockPolicyShared::lock<folly::SharedMutexImpl<false, void, std::atomic, false, true> >(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) tlm/deps/folly.exploded/include/folly/LockTraits.h:499 (ep-engine_ep_unit_tests+0x000001362a06) #7 folly::LockedPtrBase<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const, folly::SharedMutexImpl<false, void, std::atomic, false, true>, folly::LockPolicyShared>::LockedPtrBase(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const*) tlm/deps/folly.exploded/include/folly/Synchronized.h:1089 (ep-engine_ep_unit_tests+0x000001362a06) #8 folly::LockedPtr<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const, folly::LockPolicyShared>::LockedPtr(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const*) tlm/deps/folly.exploded/include/folly/Synchronized.h:1394 (ep-engine_ep_unit_tests+0x000001360063) #9 folly::SynchronizedBase<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, (folly::detail::MutexLevel)1>::rlock() tlm/deps/folly.exploded/include/folly/Synchronized.h:140 (ep-engine_ep_unit_tests+0x000001360063) #10 HdrHistogram::operator+=(HdrHistogram const&) ../kv_engine/utilities/hdrhistogram.cc:83 (ep-engine_ep_unit_tests+0x000001360063) #11 HdrHistogramTest_aggregationTest_Test::TestBody() ../kv_engine/engines/ep/tests/module_tests/hdrhistogram_test.cc:317 (ep-engine_ep_unit_tests+0x00000106945c) #12 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../googletest/googletest/src/gtest.cc:2402 (ep-engine_ep_unit_tests+0x0000013276cb) #13 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../googletest/googletest/src/gtest.cc:2438 (ep-engine_ep_unit_tests+0x00000132e814) #14 testing::Test::Run() ../googletest/googletest/src/gtest.cc:2474 (ep-engine_ep_unit_tests+0x00000131d649) #15 testing::TestInfo::Run() ../googletest/googletest/src/gtest.cc:2656 (ep-engine_ep_unit_tests+0x00000131d91d) #16 testing::TestCase::Run() ../googletest/googletest/src/gtest.cc:2774 (ep-engine_ep_unit_tests+0x00000131dac5) #17 testing::internal::UnitTestImpl::RunAllTests() ../googletest/googletest/src/gtest.cc:4649 (ep-engine_ep_unit_tests+0x00000131fe3e) #18 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../googletest/googletest/src/gtest.cc:2402 (ep-engine_ep_unit_tests+0x000001327972) #19 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../googletest/googletest/src/gtest.cc:2438 (ep-engine_ep_unit_tests+0x00000132edc8) #20 testing::UnitTest::Run() ../googletest/googletest/src/gtest.cc:4257 (ep-engine_ep_unit_tests+0x00000131d741) #21 RUN_ALL_TESTS() ../googletest/googletest/include/gtest/gtest.h:2237 (ep-engine_ep_unit_tests+0x000000d30e6a) #22 main ../kv_engine/engines/ep/tests/module_tests/ep_unit_tests_main.cc:144 (ep-engine_ep_unit_tests+0x000000d30e6a) Change-Id: I7f7448627d20c753add8c92eaf1186fb350aaab0 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/137548 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 8, 2020
Fix TSAN failures in module_tests/hdrhistogram_test.cc, due to theoretical deadlock that could occur when using two iterators that hold read locks on different HdrHistograms (this isn't the case now bug could be if the code was modified). To avoid this WARNING, ensure in our HdrHistogramTests that we never hold multiple HdrHistogram::Iterators in the same scope. Example TSAN Failure: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=16448) Cycle in lock order graph: M732257135894671160 (0x000000000000) => M730849761011117904 (0x000000000000) => M732257135894671160 Mutex M730849761011117904 acquired here while holding mutex M732257135894671160 in main thread: #0 AnnotateRWLockAcquired <null> (ep-engine_ep_unit_tests+0x64007b) #1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.cpp:99 (ep-engine_ep_unit_tests+0x162f6be) #2 annotate_rwlock_acquired /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.h:111 (ep-engine_ep_unit_tests+0x15a9881) #3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/SharedMutex.h:740 (ep-engine_ep_unit_tests+0x15a9881) #4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock() /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/SharedMutex.h:372 (ep-engine_ep_unit_tests+0x15a9881) #5 folly::detail::LockTraitsImpl<folly::SharedMutexImpl<false, void, std::atomic, false, true>, (folly::detail::MutexLevel)0, false>::lock(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/LockTraits.h:124:11 (ep-engine_ep_unit_tests+0x717b75) #6 std::integral_constant<bool, true> folly::LockPolicyExclusive::lock<folly::SharedMutexImpl<false, void, std::atomic, false, true> >(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/LockTraits.h:479:5 (ep-engine_ep_unit_tests+0x717b45) #7 folly::LockedPtrBase<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, folly::SharedMutexImpl<false, void, std::atomic, false, true>, folly::LockPolicyExclusive>::LockedPtrBase(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1097:10 (ep-engine_ep_unit_tests+0x158cb02) #8 folly::LockedPtr<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, folly::LockPolicyExclusive>::LockedPtr(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1402:50 (ep-engine_ep_unit_tests+0x158caae) #9 folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >::contextualLock() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:653:12 (ep-engine_ep_unit_tests+0x158ceae) #10 std::tuple<std::conditional<std::is_const<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > >::value, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >::ConstLockedPtr, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >::LockedPtr>::type, std::conditional<std::is_const<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const>::value, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const::ConstLockedPtr, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const::LockedPtr>::type> folly::acquireLocked<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const>(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >&, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1741:18 (ep-engine_ep_unit_tests+0x158ccda) #11 std::pair<std::conditional<std::is_const<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > >::value, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >::ConstLockedPtr, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >::LockedPtr>::type, std::conditional<std::is_const<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const>::value, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const::ConstLockedPtr, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const::LockedPtr>::type> folly::acquireLockedPair<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const>(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >&, folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1753:21 (ep-engine_ep_unit_tests+0x158bbf9) #12 HdrHistogram::operator+=(HdrHistogram const&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/utilities/hdrhistogram.cc:67:21 (ep-engine_ep_unit_tests+0x158a3bf) #13 HdrHistogramTest_aggregationTest_Test::TestBody() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/engines/ep/tests/module_tests/hdrhistogram_test.cc:317:18 (ep-engine_ep_unit_tests+0x13479b1) #14 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2433:10 (ep-engine_ep_unit_tests+0x1567683) #15 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2469:14 (ep-engine_ep_unit_tests+0x1553012) #16 testing::Test::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2508:5 (ep-engine_ep_unit_tests+0x153b87c) #17 testing::TestInfo::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2684:11 (ep-engine_ep_unit_tests+0x153c26a) #18 testing::TestSuite::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2816:28 (ep-engine_ep_unit_tests+0x153c841) #19 testing::internal::UnitTestImpl::RunAllTests() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:5338:44 (ep-engine_ep_unit_tests+0x1545a21) #20 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2433:10 (ep-engine_ep_unit_tests+0x156aa43) #21 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2469:14 (ep-engine_ep_unit_tests+0x15551e2) #22 testing::UnitTest::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:4925:10 (ep-engine_ep_unit_tests+0x15455d7) #23 RUN_ALL_TESTS() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/include/gtest/gtest.h:2473:46 (ep-engine_ep_unit_tests+0x1079927) #24 main /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/engines/ep/tests/module_tests/ep_unit_tests_main.cc:175:16 (ep-engine_ep_unit_tests+0x107978e) Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message Mutex M732257135894671160 acquired here while holding mutex M730849761011117904 in main thread: #0 AnnotateRWLockAcquired <null> (ep-engine_ep_unit_tests+0x64007b) #1 folly::detail::annotate_rwlock_acquired_impl(void const volatile*, folly::annotate_rwlock_level, char const*, int) /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.cpp:99 (ep-engine_ep_unit_tests+0x162f6be) #2 annotate_rwlock_acquired /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/synchronization/SanitizeThread.h:111 (ep-engine_ep_unit_tests+0x15ad433) #3 folly::SharedMutexImpl<false, void, std::atomic, false, true>::annotateAcquired(folly::annotate_rwlock_level) /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/SharedMutex.h:740 (ep-engine_ep_unit_tests+0x15ad433) #4 folly::SharedMutexImpl<false, void, std::atomic, false, true>::lock_shared() /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/follytsan-prefix/src/follytsan/folly/SharedMutex.h:414 (ep-engine_ep_unit_tests+0x15ad433) #5 folly::detail::LockTraitsImpl<folly::SharedMutexImpl<false, void, std::atomic, false, true>, (folly::detail::MutexLevel)1, false>::lock_shared(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/LockTraits.h:157:11 (ep-engine_ep_unit_tests+0x703215) #6 std::integral_constant<bool, true> folly::LockPolicyShared::lock<folly::SharedMutexImpl<false, void, std::atomic, false, true> >(folly::SharedMutexImpl<false, void, std::atomic, false, true>&) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/LockTraits.h:499:5 (ep-engine_ep_unit_tests+0x7031c5) #7 folly::LockedPtrBase<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const, folly::SharedMutexImpl<false, void, std::atomic, false, true>, folly::LockPolicyShared>::LockedPtrBase(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1097:10 (ep-engine_ep_unit_tests+0x915242) #8 folly::LockedPtr<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const, folly::LockPolicyShared>::LockedPtr(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:1402:50 (ep-engine_ep_unit_tests+0x9151ee) #9 folly::SynchronizedBase<folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> >, (folly::detail::MutexLevel)1>::rlock() const /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/tlm/deps/folly.exploded/include/folly/Synchronized.h:144:12 (ep-engine_ep_unit_tests+0x91502e) #10 HdrHistogram::Iterator::Iterator(folly::Synchronized<std::unique_ptr<hdr_histogram, HdrHistogram::HdrDeleter>, folly::SharedMutexImpl<false, void, std::atomic, false, true> > const&, HdrHistogram::Iterator::IterMode) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/utilities/hdrhistogram.h:89:66 (ep-engine_ep_unit_tests+0x158be71) #11 HdrHistogram::makeLinearIterator(long) const /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/utilities/hdrhistogram.cc:162:28 (ep-engine_ep_unit_tests+0x158aafb) #12 HdrHistogramTest_aggregationTest_Test::TestBody() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/engines/ep/tests/module_tests/hdrhistogram_test.cc:322:26 (ep-engine_ep_unit_tests+0x13479e2) #13 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2433:10 (ep-engine_ep_unit_tests+0x1567683) #14 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2469:14 (ep-engine_ep_unit_tests+0x1553012) #15 testing::Test::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2508:5 (ep-engine_ep_unit_tests+0x153b87c) #16 testing::TestInfo::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2684:11 (ep-engine_ep_unit_tests+0x153c26a) #17 testing::TestSuite::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2816:28 (ep-engine_ep_unit_tests+0x153c841) #18 testing::internal::UnitTestImpl::RunAllTests() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:5338:44 (ep-engine_ep_unit_tests+0x1545a21) #19 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2433:10 (ep-engine_ep_unit_tests+0x156aa43) #20 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:2469:14 (ep-engine_ep_unit_tests+0x15551e2) #21 testing::UnitTest::Run() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/src/gtest.cc:4925:10 (ep-engine_ep_unit_tests+0x15455d7) #22 RUN_ALL_TESTS() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../third_party/googletest/googletest/include/gtest/gtest.h:2473:46 (ep-engine_ep_unit_tests+0x1079927) #23 main /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/build/../kv_engine/engines/ep/tests/module_tests/ep_unit_tests_main.cc:175:16 (ep-engine_ep_unit_tests+0x107978e) Change-Id: I7c3e9369065e5344333c410602267835f9bcc7e1 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/137745 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 12, 2020
As seen in TSan testing with FollyExecutorPool when running ep_perfsuite: Running [0013/0017]: Stat latency with 100 vbuckets. Also sets & DCP traffic on separate thread...================== WARNING: ThreadSanitizer: data race (pid=16435) Read of size 8 at 0x7b24000053f8 by main thread (mutexes: write M386882167168307840): #0 folly::HHWheelTimerBase<>::Callback::isScheduled() const follytsan/folly/io/async/HHWheelTimer.h:121:14 (libep.so+0x5c63b6) #1 FollyExecutorPool::doTaskQStat(...) kv_engine/engines/ep/src/folly_executorpool.cc:839:30 (libep.so+0x34ff85) #2 EventuallyPersistentEngine::doWorkloadStats(...) kv_engine/engines/ep/src/ep_engine.cc:4115:17 (libep.so+0x2e67d6) #3 EventuallyPersistentEngine::getStats(...) kv_engine/engines/ep/src/ep_engine.cc:4670:16 (libep.so+0x2d689a) ... Previous write of size 8 at 0x7b24000053f8 by thread T1: #0 memset <null> (ep_perfsuite+0x4b3a42) #1 folly::HHWheelTimerBase<>::Callback::cancelTimeoutImpl() follytsan/folly/io/async/HHWheelTimer.cpp:86:15 (libep.so+0x5c6507) #2 folly::HHWheelTimerBase<>::Callback::cancelTimeout() follytsan/folly/io/async/HHWheelTimer.h:114:7 (libep.so+0x5c6507) #3 FollyExecutorPool::TaskProxy::wake() kv_engine/engines/ep/src/folly_executorpool.cc:239:9 (libep.so+0x35e3e9) #4 FollyExecutorPool::State::wakeTask(unsigned long) kv_engine/engines/ep/src/folly_executorpool.cc:352:29 (libep.so+0x3625cf) #5 FollyExecutorPool::wake(unsigned long)::$_9::operator()() const kv_engine/engines/ep/src/folly_executorpool.cc:731:32 (libep.so+0x3515d1) ... Issue is how FollyExecutorPool::doTaskQStat() is implemented. It takes a copy of the taskOwners map on the eventBase thread to attempt to avoid races; however this is a shallow copy - the actual TaskProxy objects which the map references via shared_ptr are not copied. As such, when the TaskProxy objects are read by the non-eventBase thread a race is seen. Solution is to instead of taking a (shallow) copy and then manipulating the copy, just move the entire work onto the eventBase thread. Change-Id: I19de6911944370e836c9905e99c860ee3d5ccb0b Reviewed-on: http://review.couchbase.org/c/kv_engine/+/137406 Reviewed-by: James Harrison <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 16, 2020
When running memcached_testapp under TSan (macOS), an exception is thrown attempting to lock an invalid mutex: libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument With the following backtrace: (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x00007fff6585e0f8 libc++abi.dylib` __cxa_throw frame #1: 0x00007fff6583855b libc++.1.dylib` std::__1::__throw_system_error(int, char const*) + 77 frame #2: 0x00007fff6582f54d libc++.1.dylib` std::__1::mutex::lock() + 29 frame #3: 0x00000001014139f1 memcached_testapp` std::__1::unique_lock<std::__1::mutex>::unique_lock(this=0x00007ffeefbff3e8, __m=<unavailable>) + 65 at __mutex_base:130 frame #4: 0x0000000101410811 memcached_testapp` std::__1::unique_lock<std::__1::mutex>::unique_lock(this=<unavailable>, __m=<unavailable>) + 33 at __mutex_base:130 frame #5: 0x000000010141053e memcached_testapp` folly::shared_mutex_detail::annotationGuard(ptr=<unavailable>) + 110 at SharedMutex.cpp:33 frame #6: 0x0000000101410445 memcached_testapp` folly::SharedMutexImpl<...>::annotateLazyCreate(this=0x0000000101984640) + 53 at SharedMutex.h:719 frame #7: 0x000000010140f6ca memcached_testapp` folly::SharedMutexImpl<...>::annotateDestroy(this=0x0000000101984640) + 26 at SharedMutex.h:732 frame #8: 0x000000010140f5af memcached_testapp` folly::SharedMutexImpl<...>::~SharedMutexImpl(this=0x0000000101984640) + 63 at SharedMutex.h:338 frame #9: 0x000000010140f71a memcached_testapp` folly::SharedMutexImpl<...>::~SharedMutexImpl(this=<unavailable>) + 26 at SharedMutex.h:312 frame #10: 0x0000000100d5549a memcached_testapp` folly::Synchronized<std::__1::unique_ptr<cb::audit::Audit, ...>, folly::SharedMutexImpl<...> >::~Synchronized(this=0x0000000101984638) + 58 at Synchronized.h:489 frame #11: 0x0000000100d511a9 memcached_testapp` folly::Synchronized<std::__1::unique_ptr<cb::audit::Audit, ...>, folly::SharedMutexImpl<...> >::~Synchronized(this=0x0000000101984638) + 41 at Synchronized.h:489 frame #12: 0x000000010ad4c721 libclang_rt.tsan_osx_dynamic.dylib` cxa_at_exit_wrapper(void*) + 33 frame #13: 0x00007fff685d813c libsystem_c.dylib` __cxa_finalize_ranges + 319 frame #14: 0x00007fff685d8412 libsystem_c.dylib` exit + 55 frame #15: 0x00007fff6852ecd0 libdyld.dylib` start + 8 The crash is caused when destructing the static variable `auditHandle`, the Synchronized dtor attempts to access another static variable (kAnnotationMutexes) which has already been destructed. The problem is that auditHandle is a static at file scope, so the order it it initiailised (and destructed) relative to other statics is undefined. Switch to using a C++11 "magic static" via a getAuditHandle() function, which defers construction until the type is used. The same problem exists for `connections`, fix in the same way. Change-Id: I58c925f2c71907ab22581bf462ce3c7d092cfefa Reviewed-on: http://review.couchbase.org/c/kv_engine/+/138051 Tested-by: Build Bot <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Dec 4, 2020
Merging http://review.couchbase.org/c/kv_engine/+/136495 into master uncovered santizer issues (mad-hatter CV runs an older Clang and did not identify these issues). This patch resolves one of these issues, before the above patch may be merged to master. WARNING: ThreadSanitizer: heap-use-after-free (pid=73551) Read of size 8 at 0x7b5800000310 by main thread: #0 operator() ../kv_engine/engines/ep/src/ephemeral_bucket.cc:301 (libep.so+0x00000034705f) #1 std::_Function_handler<void (long), EphemeralBucket::makeVBucket(...>::_M_invoke(std::_Any_data const&, long&&) #3 ~Checkpoint ../kv_engine/engines/ep/src/checkpoint.cc:228 (libep.so+0x00000019dc97) #10 ~CheckpointManager ../kv_engine/engines/ep/src/checkpoint_manager.h:73 (libep.so+0x00000042f782) ... #13 ~VBucket ../kv_engine/engines/ep/src/vbucket.cc:286 (libep.so+0x00000041689a) During VBucket destruction, the CheckpointManager member, and any remaining Checkpoints it holds are destroyed. This can trigger the memOverheadChangedCallback, as destroying Checkpoints reduces the memory overhead. This was unsafe, as the state member had already been destroyed as it is declared later in VBucket. Move the CheckpointManager to be declared after state, so it is destroyed first. Change-Id: I0a34d3a11cd053f18f3168d6fbbb9dc974bbd2fc Reviewed-on: http://review.couchbase.org/c/kv_engine/+/141492 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]> Well-Formed: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Dec 4, 2020
Cherry-picks http://review.couchbase.org/c/kv_engine/+/141492 Note: This patch is cherry-picked as the issue it resolves prevents clean forward-merging of a change _earlier_ in mad-hatter history. Merging http://review.couchbase.org/c/kv_engine/+/136495 into master uncovered santizer issues (mad-hatter CV runs an older Clang and did not identify these issues). This patch resolves one of these issues, before the above patch may be merged to master. WARNING: ThreadSanitizer: heap-use-after-free (pid=73551) Read of size 8 at 0x7b5800000310 by main thread: #0 operator() ../kv_engine/engines/ep/src/ephemeral_bucket.cc:301 (libep.so+0x00000034705f) #1 std::_Function_handler<void (long), EphemeralBucket::makeVBucket(...>::_M_invoke(std::_Any_data const&, long&&) #3 ~Checkpoint ../kv_engine/engines/ep/src/checkpoint.cc:228 (libep.so+0x00000019dc97) #10 ~CheckpointManager ../kv_engine/engines/ep/src/checkpoint_manager.h:73 (libep.so+0x00000042f782) ... #13 ~VBucket ../kv_engine/engines/ep/src/vbucket.cc:286 (libep.so+0x00000041689a) During VBucket destruction, the CheckpointManager member, and any remaining Checkpoints it holds are destroyed. This can trigger the memOverheadChangedCallback, as destroying Checkpoints reduces the memory overhead. This was unsafe, as the state member had already been destroyed as it is declared later in VBucket. Move the CheckpointManager to be declared after state, so it is destroyed first. Change-Id: I0a34d3a11cd053f18f3168d6fbbb9dc974bbd2fc Reviewed-on: http://review.couchbase.org/c/kv_engine/+/141492 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]> Well-Formed: Build Bot <[email protected]> Reviewed-on: http://review.couchbase.org/c/kv_engine/+/141498
ns-codereview
pushed a commit
that referenced
this pull request
Jan 19, 2021
The list write lock must be held when updating the pausedPurgePoint to avoid racing with updateListElem reading it. WARNING: ThreadSanitizer: data race (pid=17198) Read of size 8 at 0x7b4c00002d40 by thread T2 (mutexes: read M666668208119350224, write M838931151564315824, write M830768325700119856, write M825420198063385568): #0 boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false>::pointed_node() const <null> (libep.so+0x0000003c5dd5) #1 boost::intrusive::operator==(boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&, boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&) tlm/deps/boost.exploded/include/boost/intrusive/detail/list_iterator.hpp:119 (libep.so+0x0000003c230b) #2 BasicLinkedList::updateListElem(std::lock_guard<std::mutex>&, std::lock_guard<std::mutex>&, OrderedStoredValue&) ../kv_engine/engines/ep/src/linked_list.cc:82 (libep.so+0x0000003bf176) #3 EphemeralVBucket::modifySeqList(std::lock_guard<std::mutex>&, std::lock_guard<std::mutex>&, OrderedStoredValue&) ../kv_engine/engines/ep/src/ephemeral_vb.cc:932 (libep.so+0x000000327553) ... Previous write of size 8 at 0x7b4c00002d40 by thread T30: #0 boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false>::operator=(boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&) tlm/deps/boost.exploded/include/boost/intrusive/detail/list_iterator.hpp:79 (libep.so+0x0000003c2436) #1 BasicLinkedList::purgeTombstones(long, std::function<bool (DocKey const&, long)>, std::function<bool ()>) ../kv_engine/engines/ep/src/linked_list.cc:360 (libep.so+0x0000003c03da) #2 EphemeralVBucket::purgeStaleItems(std::function<bool ()>) ../kv_engine/engines/ep/src/ephemeral_vb.cc:391 (libep.so+0x000000326f56) #3 EphemeralVBucket::StaleItemDeleter::visit(VBucket&) ../kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:211 (libep.so+0x000000324a5e) #4 KVBucket::pauseResumeVisit(PauseResumeVBVisitor&, KVBucketIface::Position&) ../kv_engine/engines/ep/src/kv_bucket.cc:2322 (libep.so+0x0000003831e8) #5 EphTombstoneStaleItemDeleter::run() ../kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:280 (libep.so+0x0000003211dc) Change-Id: I5fd6d4bfeef5831420f9f6a472ba7bc817753c23 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/143784 Tested-by: Build Bot <[email protected]> Reviewed-by: Dave Rigby <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jan 28, 2021
Backport of 56490d4 http://review.couchbase.org/c/kv_engine/+/143784 The list write lock must be held when updating the pausedPurgePoint to avoid racing with updateListElem reading it. WARNING: ThreadSanitizer: data race (pid=17198) Read of size 8 at 0x7b4c00002d40 by thread T2 (mutexes: read M666668208119350224, write M838931151564315824, write M830768325700119856, write M825420198063385568): #0 boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false>::pointed_node() const <null> (libep.so+0x0000003c5dd5) #1 boost::intrusive::operator==(boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&, boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&) tlm/deps/boost.exploded/include/boost/intrusive/detail/list_iterator.hpp:119 (libep.so+0x0000003c230b) #2 BasicLinkedList::updateListElem(std::lock_guard<std::mutex>&, std::lock_guard<std::mutex>&, OrderedStoredValue&) ../kv_engine/engines/ep/src/linked_list.cc:82 (libep.so+0x0000003bf176) #3 EphemeralVBucket::modifySeqList(std::lock_guard<std::mutex>&, std::lock_guard<std::mutex>&, OrderedStoredValue&) ../kv_engine/engines/ep/src/ephemeral_vb.cc:932 (libep.so+0x000000327553) ... Previous write of size 8 at 0x7b4c00002d40 by thread T30: #0 boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false>::operator=(boost::intrusive::list_iterator<boost::intrusive::mhtraits<OrderedStoredValue, boost::intrusive::list_member_hook<>, &OrderedStoredValue::seqno_hook>, false> const&) tlm/deps/boost.exploded/include/boost/intrusive/detail/list_iterator.hpp:79 (libep.so+0x0000003c2436) #1 BasicLinkedList::purgeTombstones(long, std::function<bool (DocKey const&, long)>, std::function<bool ()>) ../kv_engine/engines/ep/src/linked_list.cc:360 (libep.so+0x0000003c03da) #2 EphemeralVBucket::purgeStaleItems(std::function<bool ()>) ../kv_engine/engines/ep/src/ephemeral_vb.cc:391 (libep.so+0x000000326f56) #3 EphemeralVBucket::StaleItemDeleter::visit(VBucket&) ../kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:211 (libep.so+0x000000324a5e) #4 KVBucket::pauseResumeVisit(PauseResumeVBVisitor&, KVBucketIface::Position&) ../kv_engine/engines/ep/src/kv_bucket.cc:2322 (libep.so+0x0000003831e8) #5 EphTombstoneStaleItemDeleter::run() ../kv_engine/engines/ep/src/ephemeral_tombstone_purger.cc:280 (libep.so+0x0000003211dc) Change-Id: I5fd6d4bfeef5831420f9f6a472ba7bc817753c23 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/143877 Tested-by: Build Bot <[email protected]> Reviewed-by: Dave Rigby <[email protected]> Well-Formed: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Apr 29, 2021
When converting platform to be statically linked, a crash is seen during shutdown of ep-engine_ep_unit_tests.DcpConnMapTest tests on macOS: (lldb) bt * thread #22, name = 'NonIoPool2', stop reason = signal SIGABRT * frame #0: 0x00007fff693f733a libsystem_kernel.dylib` __pthread_kill + 10 frame #1: 0x00007fff694b3e60 libsystem_pthread.dylib` pthread_kill + 430 frame #2: 0x00007fff6937e808 libsystem_c.dylib` abort + 120 frame #3: 0x00007fff665dd458 libc++abi.dylib` abort_message + 231 frame #4: 0x00007fff665ce8a7 libc++abi.dylib` demangling_terminate_handler() + 238 frame #5: 0x00007fff681095b1 libobjc.A.dylib` _objc_terminate() + 104 frame #6: 0x00007fff665dc887 libc++abi.dylib` std::__terminate(void (*)()) + 8 frame #7: 0x00007fff665df1a2 libc++abi.dylib` __cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27 frame #8: 0x00007fff665df169 libc++abi.dylib` __cxa_throw + 113 frame #9: 0x00007fff665b955b libc++.1.dylib` std::__1::__throw_system_error(int, char const*) + 77 frame #10: 0x00007fff665b054d libc++.1.dylib` std::__1::mutex::lock() + 29 frame #11: 0x000000010a78af00 libphosphor.dylib` phosphor::TraceLog::lock(this=0x000000010a797b30) + 16 at trace_log.h:250 frame #12: 0x000000010a78aedf libphosphor.dylib` std::__1::lock_guard<phosphor::TraceLog>::lock_guard(this=0x000070000b00be98, __m=0x000000010a797b30) + 15 at __mutex_base:91 frame #13: 0x000000010a787c49 libphosphor.dylib` std::__1::lock_guard<phosphor::TraceLog>::lock_guard(this=0x000070000b00be98, __m=0x000000010a797b30) + 9 at __mutex_base:91 frame #14: 0x000000010a788a2c libphosphor.dylib` phosphor::TraceLog::deregisterThread(this=0x000000010a797b30) + 28 at trace_log.cc:222 frame #15: 0x000000010023ec6d ep-engine_ep_unit_tests` CBRegisteredThreadFactory::newThread(this=0x000000010ddc3c00)>&&)::'lambda'()::operator()() + 93 at folly_executorpool.cc:49 The ExecutorPool is shutting down all its background threads, during which each thread calls phosphor::TraceLog::deregisterThread() to remove this thread from the set folly is tracking. However TraceLog is a singleton and it has already been destructed, so accessing it's mutex member variable results in an exception being thrown. This is due to a change in the static initialisation (and deinitialization) order between ExecutorPool and phosphor::TraceLog. Both are Mayer singletons, but phosphor::TraceLog is first accessed (and hence initialised) _after_ ExecutorPool - when the ExecutorPool threads first register their threads. As such, TraceLog will be destroyed before ExecutorPool (destruction is in reverse construction order). Solve by explicilty accessing (and hence initializing) TraceLog before ExecutorPool is created in ep_unit_tests_main.cc. (Note this problem doesn't occur in memcached as we explicilty initialise tracing before any buckets are created.) Change-Id: I1953129cce0d05a42f0790724c470e38b2dd0701 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/152326 Tested-by: Build Bot <[email protected]> Reviewed-by: Paolo Cocchi <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Apr 29, 2021
When converting platform to be statically linked, a crash is seen during shutdown of ep-engine_ep_unit_tests.DcpConnMapTest tests on MSVC. The ExecutorPool is consuming messages on the background threads (I believe to coordinate shutdown), and during that it attempts to log a warning message to Google Log. The cause of the crash is a change in the static initialisation (and deinitialization) order - the GoogleLog singleton instance as used internally by Folly is deinitialized before ExecutorPool singleton. As such, when the ExecutorPool singleton is shutting down, it attempts to log a message to a non-existant GLog instance and a nullptr is deferenced. Fix by changing ExecutorPool singleton to use C++11 magic static (Meyer singleton); which ensures it is destructed earlier, before GLog. Additionally, while the above is sufficient to fix this issue on macOS Catalina, on Mojave this introduces _another_ crash as some Folly hazard pointer singletons appear to already have been destructed and the following crash is seen: * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x00007fff7412f2c6 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff741eabf1 libsystem_pthread.dylib`pthread_kill + 284 frame #2: 0x00007fff740996a6 libsystem_c.dylib`abort + 127 frame #3: 0x00007fff741a8077 libsystem_malloc.dylib`malloc_vreport + 545 frame #4: 0x00007fff741a7e38 libsystem_malloc.dylib`malloc_report + 151 frame #5: 0x00007fff73ff3cf9 libdyld.dylib`_tlv_atexit + 155 frame #6: 0x000000010143cb2d ep-engine_ep_unit_tests`folly::SingletonThreadLocal<folly::hazptr_tc<std::__1::atomic>, folly::hazptr_tc_tls_tag, folly::detail::DefaultMake<folly::hazptr_tc<std::__1::atomic> >, folly::hazptr_tc_tls_tag>::getSlow(cache=0x000000010b5606b8) at SingletonThreadLocal.h:157 [opt] frame #7: 0x0000000101437a19 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) [inlined] folly::SingletonThreadLocal<folly::hazptr_tc<std::__1::atomic>, folly::hazptr_tc_tls_tag, folly::detail::DefaultMake<folly::hazptr_tc<std::__1::atomic> >, folly::hazptr_tc_tls_tag>::get() at SingletonThreadLocal.h:167 [opt] frame #8: 0x0000000101437a08 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) [inlined] folly::hazptr_tc<std::__1::atomic>& folly::hazptr_tc_tls<std::__1::atomic>() at HazptrThrLocal.h:166 [opt] frame #9: 0x0000000101437a08 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) at HazptrHolder.h:64 [opt] frame #10: 0x0000000101437a08 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) [inlined] folly::hazptr_holder<std::__1::atomic>::hazptr_holder(this=<unavailable>, domain=<unavailable>) at HazptrHolder.h:61 [opt] frame #11: 0x0000000101437a08 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) at UnboundedQueue.h:374 [opt] frame #12: 0x00000001014379e7 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(folly::CPUThreadPoolExecutor::CPUTask) [inlined] folly::UnboundedQueue<folly::CPUThreadPoolExecutor::CPUTask, false, false, false, 6ul, 7ul, std::__1::atomic>::enqueue(this=0x00007ffeefbff770, arg=0x00007ffeefbff690) at UnboundedQueue.h:271 [opt] frame #13: 0x00000001014379e7 ep-engine_ep_unit_tests`folly::UnboundedBlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::add(this=0x000000010ba00f80, item=CPUTask @ 0x00007ffeefbff690) at UnboundedBlockingQueue.h:31 [opt] frame #14: 0x0000000101437bfc ep-engine_ep_unit_tests`folly::BlockingQueue<folly::CPUThreadPoolExecutor::CPUTask>::addWithPriority(this=0x000000010ba00f80, item=CPUTask @ 0x00007ffeefbff770, (null)=<unavailable>) at BlockingQueue.h:57 [opt] frame #15: 0x0000000101436b00 ep-engine_ep_unit_tests`folly::CPUThreadPoolExecutor::stopThreads(this=0x000000010bf8de00, n=2) at CPUThreadPoolExecutor.cpp:281 [opt] frame #16: 0x000000010144bae3 ep-engine_ep_unit_tests`folly::ThreadPoolExecutor::stop() [inlined] folly::ThreadPoolExecutor::removeThreads(this=<unavailable>, n=<unavailable>) at ThreadPoolExecutor.cpp:233 [opt] frame #17: 0x000000010144bad0 ep-engine_ep_unit_tests`folly::ThreadPoolExecutor::stop(this=0x000000010bf8de00) at ThreadPoolExecutor.cpp:251 [opt] frame #18: 0x00000001014352d4 ep-engine_ep_unit_tests`folly::CPUThreadPoolExecutor::~CPUThreadPoolExecutor(this=0x000000010bf8de00, vtt=0x00000001019fd6c8) at CPUThreadPoolExecutor.cpp:126 [opt] frame #19: 0x0000000101435465 ep-engine_ep_unit_tests`folly::CPUThreadPoolExecutor::~CPUThreadPoolExecutor() [inlined] folly::CPUThreadPoolExecutor::~CPUThreadPoolExecutor(this=0x000000010bf8de00) at CPUThreadPoolExecutor.cpp:124 [opt] frame #20: 0x0000000101435459 ep-engine_ep_unit_tests`folly::CPUThreadPoolExecutor::~CPUThreadPoolExecutor(this=0x000000010bf8de00) at CPUThreadPoolExecutor.cpp:124 [opt] frame #21: 0x000000010023240a ep-engine_ep_unit_tests`FollyExecutorPool::~FollyExecutorPool(this=0x000000010b7eed40) at folly_executorpool.cc:757 [opt] frame #22: 0x00000001002325ee ep-engine_ep_unit_tests`FollyExecutorPool::~FollyExecutorPool(this=0x000000010b7eed40) at folly_executorpool.cc:751 [opt] frame #23: 0x00007fff7409a3cf libsystem_c.dylib`__cxa_finalize_ranges + 319 frame #24: 0x00007fff7409a6b3 libsystem_c.dylib`exit + 55 Address _this_ with a somewhat belt-and-braces approach - also manually shutdown the ExecutorPool in DcpConnMapTest::TearDown - as is done in other tests. Change-Id: I87f13bc3a7cdf616b52d18502dd724fcf630d3b9 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/152230 Tested-by: Build Bot <[email protected]> Reviewed-by: Richard de Mellow <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jul 1, 2021
Fix race of more than one thread trying to free a MockCookie. This is due to MockServerCookieApi::release() and destroy_mock_cookie() racing with each other. For instance, T1 could dec the ref count then pause. T2 gets run which decs the ref count, reads the ref count as 0 and frees the MockCookie. Then T1 gets run again, it tries to call getRefcount() which causes it to deref an invalid pointer as the MockCookie has been freed. To fix this make the read+write of the ref count atomic. RNING: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) (pid=107468) Write of size 8 at 0x7b4800030a80 by thread T2 (mutexes: write M1121813589057925128): #0 cb::tracing::Traceable::~Traceable() ../kv_engine/include/memcached/tracer.h:132 (ep_testsuite_dcp+0x5aab16) #1 MockCookie::~MockCookie() ../kv_engine/programs/engine_testapp/mock_cookie.cc:22 (ep_testsuite_dcp+0x5a9ddd) #2 MockCookie::~MockCookie() ../kv_engine/programs/engine_testapp/mock_cookie.cc:18 (ep_testsuite_dcp+0x5a9e45) #3 MockServerCookieApi::release(CookieIface const&) ../kv_engine/programs/engine_testapp/mock_server.cc:235 (ep_testsuite_dcp+0x5b86c4) #4 EventuallyPersistentEngine::releaseCookie(CookieIface const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/engines/ep/src/ep_engine.cc:1793 (ep_testsuite_dcp+0x765c89) #5 ConnHandler::releaseReference() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/engines/ep/src/connhandler.cc:343 (ep_testsuite_dcp+0x86cb6d) #6 DcpConnMap::manageConnections() /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/engines/ep/src/dcp/dcpconnmap.cc:392 (ep_testsuite_dcp+0x7f415e) ... Previous read of size 8 at 0x7b4800030a80 by main thread: #0 destroy_mock_cookie(CookieIface*) ../kv_engine/programs/engine_testapp/mock_cookie.cc:48 (ep_testsuite_dcp+0x5a9fa6) #1 MockTestHarness::destroy_cookie(CookieIface*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:159 (ep_testsuite_dcp+0x490ae5) #2 test_dcp_producer_stream_req_backfill(EngineIface*) ../kv_engine/engines/ep/tests/ep_testsuite_dcp.cc:2306 (ep_testsuite_dcp+0x4d3e3f) ... Location is heap block of size 336 at 0x7b4800030a80 allocated by main thread: #0 operator new(unsigned long) <null> (libtsan.so.0+0x87c5c) #1 create_mock_cookie(EngineIface*) ../kv_engine/programs/engine_testapp/mock_cookie.cc:32 (ep_testsuite_dcp+0x5a9f0a) #2 MockTestHarness::create_cookie(EngineIface*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:155 (ep_testsuite_dcp+0x490aa5) #3 test_dcp_producer_stream_req_backfill(EngineIface*) ../kv_engine/engines/ep/tests/ep_testsuite_dcp.cc:2287 (ep_testsuite_dcp+0x4d3d31) #4 execute_test(test, char const*, char const*)::$_1::operator()() const /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:445 (ep_testsuite_dcp+0x48f988) #5 test_result std::__invoke_impl<test_result, execute_test(test, char const*, char const*)::$_1&>(std::__invoke_other, execute_test(test, char const*, char const*)::$_1&) /opt/gcc-10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/invoke.h:60 (ep_testsuite_dcp+0x48f90d) #6 std::enable_if<is_invocable_r_v<test_result, execute_test(test, char const*, char const*)::$_1&>, test_result>::type std::__invoke_r<test_result, execute_test(test, char const*, char const*)::$_1&>(execute_test(test, char const*, char const*)::$_1&) /opt/gcc-10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/invoke.h:113 (ep_testsuite_dcp+0x48f89d) #7 std::_Function_handler<test_result (), execute_test(test, char const*, char const*)::$_1>::_M_invoke(std::_Any_data const&) /opt/gcc-10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/std_function.h:291 (ep_testsuite_dcp+0x48f78d) #8 std::function<test_result ()>::operator()() const /opt/gcc-10.2.0/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/std_function.h:622 (ep_testsuite_dcp+0x4900d8) #9 try_run_test(std::function<test_result ()>) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:288 (ep_testsuite_dcp+0x48d368) #10 execute_test(test, char const*, char const*) /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:445 (ep_testsuite_dcp+0x48ea37) #11 main /home/couchbase/jenkins/workspace/kv_engine.threadsanitizer_master/kv_engine/programs/engine_testapp/engine_testapp.cc:698 (ep_testsuite_dcp+0x48dbfb) Mutex M1121813589057925128 is already destroyed. SUMMARY: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) ../kv_engine/include/memcached/tracer.h:132 in cb::tracing::Traceable::~Traceable() Change-Id: I5cc6959ee9644c8c780b239cd63a6071c10c6c45 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/156420 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 11, 2021
As seen on cluster_run built with TSan, the following race is seen aborting a timed-out SyncWrite: WARNING: ThreadSanitizer: data race (pid=46769) Write of size 8 at 0x7b5400291698 by thread T60 (mutexes: read M542537949550136824, write M360774, read M541974810616805208, write M1133353977207195360): #0 Cookie::setEngineStorage(void*) kv_engine/daemon/cookie.h:432 (memcached+0x66a6e1) #1 EventuallyPersistentEngine::storeEngineSpecific(CookieIface const*, void*) kv_engine/engines/ep/src/ep_engine.cc:1841 (memcached+0x7c4162) #2 operator() kv_engine/engines/ep/src/kv_bucket.cc:2756 (memcached+0xa65268) #3 __invoke_impl<void, KVBucket::makeSyncWriteCompleteCB()::<...> /usr/include/c++/10/bits/invoke.h:60 (memcached+0xa65268) #4 __invoke_r<void, KVBucket::makeSyncWriteCompleteCB()::<...> /usr/include/c++/10/bits/invoke.h:110 (memcached+0xa65268) #5 _M_invoke /usr/include/c++/10/bits/std_function.h:291 (memcached+0xa65268) #6 std::function<void (CookieIface const*, cb::engine_errc)>::operator()(...) const /usr/include/c++/10/bits/std_function.h:622 (memcached+0x9bf920) #7 VBucket::notifyClientOfSyncWriteComplete(CookieIface const*, cb::engine_errc) kv_engine/engines/ep/src/vbucket.cc:1041 (memcached+0x9bf920) ... Previous write of size 8 at 0x7b5400291698 by thread T18 (mutexes: write M3809): #0 Cookie::setEngineStorage(void*) /home/daver/repos/couchbase/server/kv_engine/daemon/cookie.h:432 (memcached+0x66a6e1) #1 EventuallyPersistentEngine::storeEngineSpecific(...) kv_engine/engines/ep/src/ep_engine.cc:1841 (memcached+0x7bb0c2) #2 EventuallyPersistentEngine::storeIfInner(...) kv_engine/engines/ep/src/ep_engine.cc:2539 (memcached+0x7de96a) #3 EventuallyPersistentEngine::store_if(...) kv_engine/engines/ep/src/ep_engine.cc:478 (memcached+0x7debed) #4 bucket_store_if(...) kv_engine/daemon/protocol/mcbp/engine_wrapper.cc:148 (memcached+0x7517bd) #5 MutationCommandContext::storeItem() kv_engine/daemon/protocol/mcbp/mutation_context.cc:288 (memcached+0x733f6d) #6 MutationCommandContext::step() kv_engine/daemon/protocol/mcbp/mutation_context.cc:54 (memcached+0x7385f7) ... When setEngineStorage is called from EventuallyPersistentEngine::store_if (when issueing a SyncWrite) from the frontend thread, the per-frontend thread mutex is held when modifying Cookie:engine_storage. However when the background thread later modifies Cookie:engine_storage the front-end thread mutex is not held. Address this by making Cookie::engine_storage atomic. We could have added a mutex around it, but that would add additional space requirements for each Cookie (of which there are potentially many), so atomic_ptr suffices for the time being. Change-Id: I62e25b6a74d47c2da6b500cb3dc20d7ad2b01e03 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/163278 Tested-by: Build Bot <[email protected]> Reviewed-by: Paolo Cocchi <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Oct 20, 2021
During bucket shutdown we intermittently see an exception thrown during task scheduling on a background NonIO thread, which crashes the memcached process. +Analysis+ Bug is as follows. Starting at the main thread which is deleting the Bucket (Thread 1): (gdb) bt ... #10 0x0000000000649bf3 in FollyExecutorPool::schedule(std::shared_ptr<GlobalTask>) () at /c++/10.2.0/new:175 #11 0x000000000084271b in EPVBucket::scheduleDeferredDeletion(EventuallyPersistentEngine&) () at /c++/10.2.0/ext/atomicity.h:100 #12 0x00000000006dfe7a in VBucket::DeferredDeleter::operator()(VBucket*) const () at kv_engine/engines/ep/src/vbucket.cc:3990 #13 0x000000000086f874 in std::_Sp_counted_deleter<EPVBucket*, VBucket::DeferredDeleter, ...>::_M_dispose () at /c++/10.2.0/bits/shared_ptr_base.h:453 ... #18 std::shared_ptr<VBucket>::~shared_ptr (this=0x7b44000515d0, __in_chrg=<optimized out>) at /c++/10.2.0/bits/shared_ptr.h:121 #19 PagingVisitor::~PagingVisitor (this=0x7b4400051540, __in_chrg=<optimized out>) at kv_engine/engines/ep/src/paging_visitor.h:39 ... #31 std::__shared_ptr<GlobalTask, (__gnu_cxx::_Lock_policy)2>::reset () at /c++/10.2.0/bits/shared_ptr_base.h:1301 #32 EventuallyPersistentEngine::waitForTasks(std::vector<std::shared_ptr<GlobalTask>, std::allocator<std::shared_ptr<GlobalTask> > >&) () at kv_engine/engines/ep/src/ep_engine.cc:6752 #33 0x000000000082396f in EventuallyPersistentEngine::destroyInner(bool) () at kv_engine/engines/ep/src/ep_engine.cc:2135 1. PagingVisitor is still in existence running after `EventuallyPersistentEngine::destroyInner` - see frame #19. This is because all tasks belonging to bucket were returned from unregisterTaskable() just before. 2. PagingVisitor (via VBCBAdaptor) is destroyed, it decrements the refcount on the shared_ptr<VBucket> it owns - see frame #18. 3. That is the last reference to the VBucket, which results in VBucket::DeferredDeleter being invoked which in turn schedules a task to delete the VBucket (disk and memory) in the background - see frame #11. We see the schedule's lambda happen on the SchedulerPool0 thread (T:35): Thread 35 "SchedulerPool0" hit Catchpoint 1 (exception thrown), __cxxabiv1::__cxa_throw (..., tinfo=0x10c4ec8 <typeinfo for std::out_of_range@@GLIBCXX_3.4>, ...) at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:80 (gdb) bt #1 0x00007ffff4cad7d2 in std::__throw_out_of_range (__s=__s@entry=0xcc68e6 "_Map_base::at") at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/functexcept.cc:82 ... #3 0x00000000005504ee in std::unordered_map<...>::at (__k=@0x7fffe83a8f88: 0x7b7400000848, this=0x7b1000005580) at /c++/10.2.0/bits/unordered_map.h:1000 #4 FollyExecutorPool::State::scheduleTask (this=..., executor=..., pool=..., task=...) at kv_engine/executor/folly_executorpool.cc:415 ... #8 folly::EventBase::runInEventBaseThreadAndWait(...) at folly/io/async/EventBase.cpp:671 ... In FollyExecutorPool::State::scheduleTask (frame #3) we attempt to lookup the Taskable (Bucket) in the ExecutorPool's map, however given its already been unregistered, the taskable is not found an the std::out_of_range exception is thrown. This is a lifetime issue. We have VBucket objects potentially being kept alive longer than their expected lifetime by virtue of background tasks having shared ownership of them - and those background tasks outlive the lifetime of their parent object (KVBucket), and crucially past when the owning Bucket is unregistered with the ExecutorPool and can no longer schedule tasks. When it then _does+ attempt to schedule a task against an unregistered (and deleted) Taskable; we see the crash. +Solution+ There's arguably two problems which should be addressed (although technically only one of the two is required to encounter this crash): 1. Background tasks owning VBuckets when they are not executing. 2. Background tasks outliving their associated Taskable (aka Bucket). This patch addresses the critical issue of (1) - we remove the (shared) ownership of VBucket from the background tasks which previoulsy had it - both PagingVisitor which is the problematic class in this scenario, but also in the other background Tasks which potentially have the same problem. The 2nd patch will tighten up the API for visiting VBuckets, so visitors are not passed a VBucketPtr, but instead VBucket& which reduces the chance of similar problems happening in future. The 3rd patch will adddress Background Taks outliving their Taskable. Change-Id: I340a3e4dc3d9234c4a34866b410fb8295a1c98d1 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/163783 Tested-by: Dave Rigby <[email protected]> Reviewed-by: Richard de Mellow <[email protected]> Reviewed-by: James H <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Nov 2, 2021
Previously, StatCheckpointTask and StatDCPTask immediately wrote responses when collecting stats while on a background thread. TSAN reported this as unsafe; no locks prevent potential racing with a frontend thread manipulating the cookie. Change both tasks to accumulate task values, but leave the frontend thread to actually write the responses when it resumes the ewouldblock'ed operation. TSAN Report: WARNING: ThreadSanitizer: data race (pid=24371) Read of size 8 at 0x7b54000a2df0 by thread T62: #0 Cookie::getHeader() const kv_engine/daemon/cookie.cc:201 (memcached+0x6508ac) #1 append_stats kv_engine/daemon/protocol/mcbp/stats_context.cc:95 (memcached+0x71fd6c) .... #19 void StatCollector::addStat<cb::stats::Key, unsigned long const&>(cb::stats::Key&&, unsigned long const&) const ../kv_engine/include/statistics/collector.h:336 (memcached+0x7e50e5) #20 EventuallyPersistentEngine::addAggregatedProducerStats(BucketStatCollector const&, ConnCounter const&) kv_engine/engines/ep/src/ep_engine.cc:4038 (memcached+0x7e50e5) #21 EventuallyPersistentEngine::doDcpStatsInner(CookieIface const*, std::function<void (std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, void const*)> const&, std::basic_string_view<char, std::char_traits<char> >) kv_engine/engines/ep/src/ep_engine.cc:4030 (memcached+0x81bd05) Previous write of size 8 at 0x7b54000a2df0 by thread T21 (mutexes: write M3843): #0 Cookie::setPacket(cb::mcbp::Header const&, bool) kv_engine/daemon/cookie.cc:186 (memcached+0x65080e) #1 Cookie::preserveRequest() kv_engine/daemon/cookie.h:225 (memcached+0x696aa7) #2 Connection::executeCommandPipeline() kv_engine/daemon/connection.cc:581 (memcached+0x696aa7) #3 Connection::executeCommandsCallback() kv_engine/daemon/connection.cc:793 (memcached+0x696be8) #4 Connection::rw_callback(bufferevent*, void*) kv_engine/daemon/connection.cc:942 (memcached+0x697851) #5 bufferevent_run_deferred_callbacks_unlocked /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/libevent/libevent-prefix/src/libevent/bufferevent.c:208 (libevent_core-2.1.so.7+0xf71d) #6 folly::EventBase::loopBody(int, bool) folly/io/async/EventBase.cpp:397 (memcached+0xfc9b52) #7 folly::EventBase::loop() folly/io/async/EventBase.cpp:315 (memcached+0xfcb06b) #8 folly::EventBase::loopForever() folly/io/async/EventBase.cpp:538 (memcached+0xfcb06b) #9 worker_libevent kv_engine/daemon/thread.cc:115 (memcached+0x6c16af) #10 CouchbaseThread::run() platform/src/cb_pthreads.cc:51 (memcached+0xf217d5) #11 platform_thread_wrap platform/src/cb_pthreads.cc:64 (memcached+0xf217d5) Change-Id: I3fbd8d51e174a7d19c5cb608a969795e445b8e86 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/163709 Tested-by: Build Bot <[email protected]> Reviewed-by: Dave Rigby <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jul 18, 2022
As observed in tests in patch to fix MB-47267 ("MB-47267 / MB-52383: Make backfill during warmup a PauseResume task"), ObjectRegister getAllocSize can be read and written by different threads without synchronisation when EP engine instances are destroyed and re-created: WARNING: ThreadSanitizer: data race (pid=128791) Read of size 8 at 0x7f584d8d48c0 by thread T41 (mutexes: write M333120309177634496, write M279640201042175720): #0 ObjectRegistry::onCreateBlob(Blob const*) ../kv_engine/engines/ep/src/objectregistry.cc:85 (ep.so+0x0000002d60aa) #1 Blob::Blob(char const*, unsigned long) ../kv_engine/engines/ep/src/blob.cc:51 (ep.so+0x00000006ba08) #2 Blob::New(char const*, unsigned long) ../kv_engine/engines/ep/src/blob.cc:26 (ep.so+0x00000006ba56) #3 vbucket_transition_state::toItem(Item&) const ../kv_engine/engines/ep/src/vbucket_state.cc:31 (ep.so+0x0000002b1c39) #4 CheckpointManager::queueSetVBState(VBucket&) ../kv_engine/engines/ep/src/checkpoint_manager.cc:953 (ep.so+0x00000008030a) #5 Warmup::populateVBucketMap(unsigned short) ../kv_engine/engines/ep/src/warmup.cc:1508 (ep.so+0x0000002c55fd) #6 WarmupPopulateVBucketMap::run() ../kv_engine/engines/ep/src/warmup.cc:350 (ep.so+0x0000002d47dd) #7 ExecutorThread::run() ../kv_engine/engines/ep/src/executorthread.cc:190 (ep.so+0x0000001ec57b) #8 launch_executor_thread ../kv_engine/engines/ep/src/executorthread.cc:36 (ep.so+0x0000001ecb69) #9 CouchbaseThread::run() ../platform/src/cb_pthreads.cc:58 (libplatform_so.so.0.1.0+0x00000000a237) #10 platform_thread_wrap ../platform/src/cb_pthreads.cc:71 (libplatform_so.so.0.1.0+0x00000000a237) #11 <null> <null> (libtsan.so.0+0x00000002843b) Previous write of size 8 at 0x7f584d8d48c0 by main thread: #0 ObjectRegistry::initialize(unsigned long (*)(void const*)) ../kv_engine/engines/ep/src/objectregistry.cc:72 (ep.so+0x0000002d5ea7) #1 create_instance ../kv_engine/engines/ep/src/ep_engine.cc:1777 (ep.so+0x000000191c06) #2 create_engine_instance(engine_reference*, server_handle_v1_t* (*)(), EngineIface**) ../kv_engine/utilities/engine_loader.cc:95 (engine_testapp+0x0000004614b9) #3 MockTestHarness::create_bucket(bool, char const*) <null> (engine_testapp+0x00000041f295) #4 test_reader_thread_starvation_warmup ../kv_engine/engines/ep/tests/ep_testsuite.cc:8246 (ep_testsuite.so+0x000000071909) #5 execute_test ../kv_engine/programs/engine_testapp/engine_testapp.cc:1402 (engine_testapp+0x00000041ac82) #6 main ../kv_engine/programs/engine_testapp/engine_testapp.cc:1675 (engine_testapp+0x00000041be5c) Location is global 'getAllocSize' of size 8 at 0x7f584d8d48c0 (ep.so+0x0000007708c0) In practice this race is most likely benign, as ObjectRegistry::initialize() is always passed the same argument to store to getAllocSize. However to silence TSan, change to an atomic variable accessed with memory_order_acquire / memory_order_release. Note this is the default ordering on x86-64, so this doesn't actually add any additional cost. Change-Id: I65d566270ae5fa0602fe0a907e78c2b6ae227353 Reviewed-on: https://review.couchbase.org/c/kv_engine/+/177600 Tested-by: Build Bot <[email protected]> Well-Formed: Restriction Checker Reviewed-by: Paolo Cocchi <[email protected]> Reviewed-by: Ben Huddleston <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Nov 3, 2022
As part of EPBucket::prepareForPause(), all of the vb_mutexes are lock()ed - and left locked: a) To ensure that any in-flight VBucket writes have completed and b) To inhibit and new writes from occurring. Then, when the EPBucket is resumed all the vb_mutexes are unlock()ed which allows VBucket writes to resume. However, EPBucket::prepareForResume() is not called on the same thread which called prepareForPause() - prepareForPause() runs on a background NonIO thread whereas resume runs synchronously in the front-end thread. As such, we are incorrectly unlocking a mutex from a different thread than the one which locked it - which is Undefined Behaviour - from cppreference.com[1]: void unlock(); Unlocks the mutex. The mutex must be locked by the current thread of execution, otherwise, the behavior is undefined. This is helpfully reported by ThreadSanitizer: WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread) (pid=58528) #0 pthread_mutex_unlock <null> (libtsan.so.0+0x3bf9a) #1 __gthread_mutex_unlock(pthread_mutex_t*) c++/10.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h:779 (memcached+0x5b594f) #2 std::mutex::unlock() c++/10.2.0/bits/std_mutex.h:118 (memcached+0x602555) #3 EPBucket::prepareForResume() kv_engine/engines/ep/src/ep_bucket.cc:2575 (memcached+0x84b94f) #4 EventuallyPersistentEngine::resume() kv_engine/engines/ep/src/ep_engine.cc:7002 (memcached+0x7d52d3) ... Fix by changing how we achieve inhibition of future VBucket writes: - Introduce a EPBucket::paused flag which is set in prepareForPause after all vb_mutexes have been acquired (and hence all in-flight VBucket writes have finished), but then unlock all vb_mutexes before returning from prepareForPause(). - When attempting to acquire a locked VBucket, check new paused flag before attempting to acquire the vb_mutex - if paused is set then block / return early (for try() variant). This keeps the required pause behaviour but avoids keeping vb_mutexes locked and having to later unlock (on a different thread). [1]: https://en.cppreference.com/w/cpp/thread/mutex Change-Id: I062583951a101a866866b79dfd6329672bb4ff42 Reviewed-on: https://review.couchbase.org/c/kv_engine/+/182099 Tested-by: Build Bot <[email protected]> Reviewed-by: Jim Walker <[email protected]>
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 18, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 18, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 19, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 19, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 23, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
jimwwalker
added a commit
to jimwwalker/kv_engine
that referenced
this pull request
Jan 23, 2023
Replace the todo markers with code that now utilises the magma history API - this now means scanAllVersions for example is hooked into the magma history scanning API. Add new tests that validate multiple versions can be stored and returned. Also required are changes to unit tests to respect new expectation checks that occur in magma - primarily that flushing writes ordered batches - this is only a problem for tests which bypass the flusher and call KVStore directly. **** ISSUES **** ep-engine_ep_unit_tests does not pass: 1) Exception from magma MagmaKVStoreRollbackTest.Rollback hits the following exception GSL: Precondition failure: 'levelSize >= compactionState[level].history.Size' at /Users/jimwalker/Code/couchbase/neo/magma/lsm/lsm_tree.cc:895 2) Seg-fault in magma Seen in a number of tests, 1 example: CollectionsDcpEphemeralOrPersistent/CollectionsDcpParameterizedTest.DefaultCollectionDropped/persistent_magma_value_only Process 78731 stopped * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] 72 } 73 74 Slice DocSequenceBuffer::GetKey() { -> 75 seqFmt.Set(sortedList[offset]->seqno); 76 return seqFmt.Encode(); 77 } 78 * thread couchbase#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT) * frame #0: 0x00000001012eb7b0 ep-engine_ep_unit_tests`magma::DocSequenceBuffer::GetKey(this=0x0000000118131700) at lsd.cc:75:36 [opt] frame couchbase#1: 0x0000000101361e2e ep-engine_ep_unit_tests`magma::mvccIteratorAdaptor::GetKey(this=0x0000000118536c00) at mvcc.h:249:25 [opt] frame couchbase#2: 0x000000010132b688 ep-engine_ep_unit_tests`magma::IteratorWithFilter::filterKeys(this=0x0000000118128350) at iterator.cc:214:32 [opt] frame couchbase#3: 0x000000010132de5b ep-engine_ep_unit_tests`magma::KVReader::ReadKVs(this=0x00007ff7bfefd550) at common.cc:70:19 [opt] frame couchbase#4: 0x0000000101378f63 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, w=0x00007ff7bfefd890, itr=0x0000000118128350, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefd860)>) at lsm_tree.cc:719:15 [opt] frame couchbase#5: 0x0000000101376ee8 ep-engine_ep_unit_tests`magma::LSMTree::writeSSTable(this=0x000000011855a820, appendMode=<unavailable>, itr=0x0000000118128350, sizeEstimate=<unavailable>, maxSn=10, stopFn=function<bool (const magma::Slice &)> @ 0x00007ff7bfefdb60)>) at lsm_tree.cc:682:17 [opt] frame couchbase#6: 0x00000001013761b2 ep-engine_ep_unit_tests`magma::LSMTree::writeMemtable(this=0x000000011855a820, memtable=0x000000011854c7a0) at lsm_tree.cc:449:21 [opt] frame #7: 0x000000010137753f ep-engine_ep_unit_tests`magma::LSMTree::doMemtableFlushWork(this=0x000000011855a820) at lsm_tree.cc:531:18 [opt] frame #8: 0x000000010139fe62 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] magma::LSMTree::newFlush(this=<unavailable>)::$_16::operator()() const at lsm_tree.cc:993:34 [opt] frame #9: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_16&>(fp)()) std::__1::__invoke<magma::LSMTree::newFlush()::$_16&>(magma::LSMTree::newFlush()::$_16&) at type_traits:3918:1 [opt] frame #10: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::tuple<magma::Status, magma::CheckpointTransaction> std::__1::__invoke_void_return_wrapper<std::__1::tuple<magma::Status, magma::CheckpointTransaction>, false>::__call<magma::LSMTree::newFlush(__args=<unavailable>)::$_16&>(magma::LSMTree::newFlush()::$_16&) at invoke.h:30:16 [opt] frame #11: 0x000000010139fe5d ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #12: 0x000000010139fe59 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::LSMTree::newFlush()::$_16, std::__1::allocator<magma::LSMTree::newFlush()::$_16>, std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #13: 0x00000001012f72af ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::__function::__value_func<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #14: 0x00000001012f7296 ep-engine_ep_unit_tests`magma::FlushWork::Execute() [inlined] std::__1::function<std::__1::tuple<magma::Status, magma::CheckpointTransaction> ()>::operator(this=0x0000000118131560)() const at function.h:1182:12 [opt] frame #15: 0x00000001012f7292 ep-engine_ep_unit_tests`magma::FlushWork::Execute(this=0x0000000118131560) at flush_work.cc:61:29 [opt] frame #16: 0x0000000101389d5e ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x00007ff7bfefe1c0)::$_38::operator()() at kvstore.cc:515:27 [opt] frame #17: 0x0000000101388fac ep-engine_ep_unit_tests`magma::KVStore::flushMemTables(this=0x000000010442a420, wal=<unavailable>, offset=(SegID = 1, SegOffset = 4096), flushMode=<unavailable>, blockMode=Blocking) at kvstore.cc:582:16 [opt] frame #18: 0x0000000101389a5a ep-engine_ep_unit_tests`magma::KVStore::FlushMemTables(this=<unavailable>, wal=<unavailable>, flushMode=<unavailable>, blockMode=<unavailable>) at kvstore.cc:387:12 [opt] frame #19: 0x00000001012fd9ba ep-engine_ep_unit_tests`magma::Magma::Impl::syncKVStore(this=0x000000011814f000, kvID=<unavailable>, checkpoint=true) at db.cc:1352:21 [opt] frame #20: 0x000000010132678a ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=0x00007ff7bfefe400)>)::$_7::operator()() const at db.cc:880:23 [opt] frame #21: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] magma::Magma::Impl::CompactKVStore(this=<unavailable>)>)::$_8::operator()() const at db.cc:891:21 [opt] frame #22: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)>)::$_8&>(fp)()) std::__1::__invoke<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at type_traits:3918:1 [opt] frame #23: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<magma::Magma::Impl::CompactKVStore(__args=<unavailable>)>)::$_8&>(magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8&) at invoke.h:61:9 [opt] frame #24: 0x0000000101326772 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:178:16 [opt] frame #25: 0x0000000101326764 ep-engine_ep_unit_tests`std::__1::__function::__func<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8, std::__1::allocator<magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::$_8>, void ()>::operator(this=<unavailable>)() at function.h:352:12 [opt] frame #26: 0x0000000101303138 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::__function::__value_func<void ()>::operator(this=<unavailable>)() const at function.h:505:16 [opt] frame #27: 0x000000010130312d ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] std::__1::function<void ()>::operator(this=0x00007ff7bfefe4b0)() const at function.h:1182:12 [opt] frame #28: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:92:9 [opt] frame #29: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::__1::function<std::__1::unique_ptr<magma::Magma::CompactionCallback, std::__1::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) [inlined] magma::defer::~defer(this=0x00007ff7bfefe4b0) at common.h:91:14 [opt] frame #30: 0x0000000101303129 ep-engine_ep_unit_tests`magma::Magma::Impl::CompactKVStore(this=<unavailable>, kvID=<unavailable>, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefe550)>) at db.cc:895:1 [opt] frame #31: 0x000000010130336c ep-engine_ep_unit_tests`magma::Magma::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=<unavailable>, makeCallback=<unavailable>)>) at db.cc:901:18 [opt] frame #32: 0x000000010004fd3d ep-engine_ep_unit_tests`MagmaMemoryTrackingProxy::CompactKVStore(this=<unavailable>, kvID=0, lowKey=0x00007ff7bfefe780, highKey=0x00007ff7bfefe780, makeCallback=magma::Magma::CompactionCallbackBuilder @ 0x00007ff7bfefea00)>) at magma-memory-tracking-proxy.cc:190:19 [opt] frame #33: 0x00000001000a9eeb ep-engine_ep_unit_tests`MagmaKVStore::compactDBInternal(this=<unavailable>, vbLock=0x00007ff7bfefeda0, ctx=std::__1::shared_ptr<CompactionContext>::element_type @ 0x00000001184acc20 strong=3 weak=1) at magma-kvstore.cc:2590:29 [opt] frame #34: 0x00000001000a93ad ep-engine_ep_unit_tests`MagmaKVStore::compactDB(this=0x00000001067e6500, vbLock=0x00007ff7bfefeda0, ctx=nullptr) at magma-kvstore.cc:2445:12 [opt] frame #35: 0x00000001001d7eb0 ep-engine_ep_unit_tests`EPBucket::compactInternal(this=0x00000001067e6000, vb=0x00007ff7bfefed90, config=<unavailable>) at ep_bucket.cc:1398:25 [opt] frame #36: 0x00000001001d83f6 ep-engine_ep_unit_tests`EPBucket::doCompact(this=0x00000001067e6000, vbid=(vbid = 0), config=0x00007ff7bfefedf0, cookies=size=0) at ep_bucket.cc:1476:14 [opt] 3) Key sorting issue Magma now checks for sorted keys - it turns out KV flushing is violating that ordering. Need to know if KV should fix or is the magma check required?? Example: CollectionsDcpEphemeralOrPersistent/CollectionsLegacyDcpTest.default_collection_is_not_vbucket_highseqno_with_pending/persistent_nexus_couchstore_magma_value_only CRITICAL [(SynchronousEPEngine:default) magma_0]Fatal error: Found: preceding key(d2) > current key( _collection). If history is enabled, all keys in the batch must be sorted lexicographicall The problem is that the test flushes a prepare(default collection, key=d2) and create-collection(fruit) together. The flusher orders these... \0d2 \1create_fruit This is correct. But \0d2 is marked as a prepare, when flushed to disk it goes into a special namespace. This occurs in KVStore after the sorting. \0d2 becomes \2\0d2 And magma actually sees \2\0d2 \1create_fruit and we have violated the expects Change-Id: Ica9ea1b52c51f125c9e8839a0fca412834fc25f7
ns-codereview
pushed a commit
that referenced
this pull request
May 4, 2023
dcpdrain currently sets the DCP noop-interval to 1s, so the producer will send NOOP requests to dcpdrain every 1s and dcpdrain needs to correctly handle this request and send a response. When connecting to clusters with high latency between client and server nodes, it can take more than 1 second to complete setting up the DCP connection and endering the main event loop. This means the server node may start to send DCP noop requests before the DCP connection is setup - and crucially dcpdrain's event loop is ready to process the DCP noop request. This results in dcpdrain crashing as it gets a DCP noop request when it is expecting a control response: Process 43094 launched: '/Users/dave/repos/couchbase/server/source/build/kv_engine/dcpdrain' (arm64) Using DCP flow control with buffer size: 13421772 Set DCP control message: set_priority=high Set DCP control message: supports_cursor_dropping_vulcan=true Set DCP control message: supports_hifi_MFU=true Set DCP control message: send_stream_end_on_client_close_stream=true Set DCP control message: enable_expiry_opcode=true Set DCP control message: set_noop_interval=1 Set DCP control message: enable_noop=true Set DCP control message: enable_out_of_order_snapshots=true 2023-05-03T12:11:28.705431+01:00 CRITICAL *** Fatal error encountered during exception handling *** 2023-05-03T12:11:28.708626+01:00 CRITICAL Caught unhandled std::exception-derived exception. what(): Header::getResponse(): Header is not a response Target 0: (dcpdrain) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x00000001c24a2d98 libsystem_kernel.dylib` __pthread_kill + 8 frame #1: 0x00000001c24d7ee0 libsystem_pthread.dylib` pthread_kill + 288 frame #2: 0x00000001c2412340 libsystem_c.dylib` abort + 168 frame #3: 0x00000001c2492b08 libc++abi.dylib` abort_message + 132 frame #4: 0x00000001c2482938 libc++abi.dylib` demangling_terminate_handler() + 312 frame #5: 0x00000001c2378330 libobjc.A.dylib` _objc_terminate() + 160 frame #6: 0x000000010008ef30 dcpdrain` backtrace_terminate_handler() + 752 at terminate_handler.cc:88 frame #7: 0x00000001c2491ea4 libc++abi.dylib` std::__terminate(void (*)()) + 20 frame #8: 0x00000001c2494c1c libc++abi.dylib` __cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 36 frame #9: 0x00000001c2494bc8 libc++abi.dylib` __cxa_throw + 140 frame #10: 0x000000010002a77c dcpdrain` BinprotResponse::getTracingData() const [inlined] cb::mcbp::Header::getResponse(this=0x00006000002044a0) const + 48 at header.h:134 frame #11: 0x000000010002a74c dcpdrain` BinprotResponse::getTracingData() const [inlined] BinprotResponse::getResponse(this=<unavailable>) const at client_mcbp_commands.cc:487 frame #12: 0x000000010002a74c dcpdrain` BinprotResponse::getTracingData(this=0x000000016fdfef90) const + 188 at client_mcbp_commands.cc:373 frame #13: 0x000000010002a638 dcpdrain` MemcachedConnection::recvResponse(this=0x0000000101604080, response=0x000000016fdfef90, opcode=<unavailable>, readTimeout=<unavailable>) + 84 at client_connection.cc:1043 ... frame #21: 0x0000000100038f40 dcpdrain` MemcachedConnection::backoff_execute(..., context="DCP_CONTROL", ...) + 100 at client_connection.cc:2016 frame #22: 0x000000010002bab4 dcpdrain` MemcachedConnection::execute(this=0x0000000101604080, command=0x000000016fdfefb0, readTimeout=(__rep_ = 0)) + 168 at client_connection.cc:1998 frame #23: 0x000000010000d688 dcpdrain` main + 280 at dcpdrain.cc:451 frame #24: 0x000000010000d570 dcpdrain` main(argc=<unavailable>, argv=<unavailable>) + 8488 at dcpdrain.cc:929 frame #25: 0x00000001005d508c dyld` start + 520 Ideally dcpdrain should be robust to receiving dcp NOOP messages while setting up the control flags, but that's not simple as we use common code in MemcachedConnection which performs a request and expects a response (of type DCP_CONTROL) in-order. To workaround this problem simply increase the default DCP noop interval from 1 to 60 seconds - 60s /should/ be sufficient to complete the handshake... Change-Id: I0f846956d6499ea54d74f781cb14d7982387c9f4 Reviewed-on: https://review.couchbase.org/c/kv_engine/+/190418 Tested-by: Build Bot <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jan 12, 2024
The method did not take a queueLock and could mutate the CheckpointManager while it is being accessed, e.g. in CheckpointManager::getListOfCursorsToDrop. CheckpointMemRecoveryTask calls getListOfCursorsToDrop which iterates CM::cursors. A concurrent RollbackTask can result in resetting the vbucket and calling CM::takeAndResetCursors, which among others mutates CM::cursors. WARNING: ThreadSanitizer: data race (pid=47061) Write of size 8 at 0x00010d3b77a8 by main thread (mutexes: write M0, write M1, write M2): #0 CheckpointManager::takeAndResetCursors(CheckpointManager&) checkpoint_manager.cc:1754 (ep-engine_ep_unit_tests:arm64+0x1003c7dd8) #1 KVBucket::resetVBucket_UNLOCKED(LockedVBucketPtr&, std::__1::unique_lock<std::__1::mutex>&) kv_bucket.cc:1273 (ep-engine_ep_unit_tests:arm64+0x1001fc414) #2 KVBucket::rollback(Vbid, unsigned long long) kv_bucket.cc:2634 (ep-engine_ep_unit_tests:arm64+0x10020a910) #3 CheckpointRemoverTest_MB59601_Test::TestBody() checkpoint_remover_test.cc:518 (ep-engine_ep_unit_tests:arm64+0x1005d2224) #4 virtual thunk to CheckpointRemoverTest_MB59601_Test::TestBody() checkpoint_remover_test.cc (ep-engine_ep_unit_tests:arm64+0x1005d24e8) #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) gtest.cc:2648 (ep-engine_ep_unit_tests:arm64+0x101b8f6bc) #6 <null> <null> (0x000186e390e0) Previous read of size 8 at 0x00010d3b77a8 by thread T1 (mutexes: write M3): #0 CheckpointManager::getListOfCursorsToDrop() checkpoint_manager.cc:842 (ep-engine_ep_unit_tests:arm64+0x1003c1af0) #1 CheckpointMemRecoveryTask::attemptCursorDropping() checkpoint_remover.cc:183 (ep-engine_ep_unit_tests:arm64+0x1003caf8c) #2 CheckpointMemRecoveryTask::runInner(bool) checkpoint_remover.cc:245 (ep-engine_ep_unit_tests:arm64+0x1003cb77c) #3 EpNotifiableTask::run() ep_task.cc:56 (ep-engine_ep_unit_tests:arm64+0x10028763c) #4 void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, CheckpointRemoverTest_MB59601_Test::TestBody()::$_2::operator()() const::'lambda0'()>>(void*) thread:299 (ep-engine_ep_unit_tests:arm64+0x100600c30) Change-Id: I15c1e9ccc6f45f3251ebd7f78649c8a446d65b54 Reviewed-on: https://review.couchbase.org/c/kv_engine/+/203302 Reviewed-by: Vesko Karaganev <[email protected]> Tested-by: Build Bot <[email protected]> Reviewed-by: Paolo Cocchi <[email protected]>
ns-codereview
pushed a commit
that referenced
this pull request
Jan 24, 2024
The method did not take a queueLock and could mutate the CheckpointManager while it is being accessed, e.g. in CheckpointManager::getListOfCursorsToDrop. CheckpointMemRecoveryTask calls getListOfCursorsToDrop which iterates CM::cursors. A concurrent RollbackTask can result in resetting the vbucket and calling CM::takeAndResetCursors, which among others mutates CM::cursors. WARNING: ThreadSanitizer: data race (pid=60355) Write of size 8 at 0x00010d1a5e68 by main thread (mutexes: write M0, write M1, write M2): #0 CheckpointManager::takeAndResetCursors(CheckpointManager&) checkpoint_manager.cc:1856 (ep-engine_ep_unit_tests:arm64+0x1003795b4) #1 KVBucket::resetVBucket_UNLOCKED(LockedVBucketPtr&, std::__1::unique_lock<std::__1::mutex>&) kv_bucket.cc:1271 (ep-engine_ep_unit_tests:arm64+0x1001da918) #2 KVBucket::rollback(Vbid, unsigned long long) kv_bucket.cc:2671 (ep-engine_ep_unit_tests:arm64+0x1001e8404) #3 CheckpointRemoverTest_MB59601_Test::TestBody() checkpoint_remover_test.cc:513 (ep-engine_ep_unit_tests:arm64+0x10054117c) #4 virtual thunk to CheckpointRemoverTest_MB59601_Test::TestBody() checkpoint_remover_test.cc (ep-engine_ep_unit_tests:arm64+0x100541448) #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) gtest.cc:2643 (ep-engine_ep_unit_tests:arm64+0x10195a8e0) #6 <null> <null> (0x000186e390e0) Previous read of size 8 at 0x00010d1a5e68 by thread T1 (mutexes: write M3): #0 CheckpointManager::getListOfCursorsToDrop() checkpoint_manager.cc:802 (ep-engine_ep_unit_tests:arm64+0x100372bdc) #1 CheckpointMemRecoveryTask::attemptCursorDropping() checkpoint_remover.cc:174 (ep-engine_ep_unit_tests:arm64+0x10037c710) #2 CheckpointMemRecoveryTask::runInner() checkpoint_remover.cc:291 (ep-engine_ep_unit_tests:arm64+0x10037d068) #3 NotifiableTask::run() notifiable_task.cc:18 (ep-engine_ep_unit_tests:arm64+0x101934ed8) #4 void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, CheckpointRemoverTest_MB59601_Test::TestBody()::$_3::operator()() const::'lambda0'()>>(void*) thread:299 (ep-engine_ep_unit_tests:arm64+0x1005661f0) Change-Id: I7fe1ed1f6ebca811a5dfca6c2e69d04bfa91b2b8 Reviewed-on: https://review.couchbase.org/c/kv_engine/+/203991 Tested-by: Pavlos Georgiou <[email protected]> Reviewed-by: Vesko Karaganev <[email protected]> Reviewed-by: Paolo Cocchi <[email protected]> Well-Formed: Restriction Checker
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PlasmaKVStore is an experimental KVStore implementation backed by
Plasma. It is very incomplete, and is not intended for general usage.
As such, it is hidden behind the
EP_USE_PLASMA
flag.To use it, one can build with (for example)
make EXTRA_CMAKE_OPTIONS='-DEP_USE_PLASMA=ON'
Change-Id: Id687170ca87eb9f25764576409fc18f508a0328f