Skip to content

Commit

Permalink
MB-34017: Optimize warmup - Only warmup prepares from HCS to HPS
Browse files Browse the repository at this point in the history
For Durability, we have introduced a new LoadPrepare phase at Warmup.
That is necessary for loading pending Prepares from disk and inserting
them into memory structures (ie, HashTable, CheckpointManager,
DurabilityMonitor) for leading them to completion.

Given that we need to re-process only Prepares that have not been
completed (ie, Committed or Aborted), then we can safely start the
LoadPrepare scan from the HCS (excluded) onward. That's because by
definition every Prepare before or at HCS has been completed.

After introducing the LoadPrepare phase (and before this change) we have
seen an increase of 100% on the total Warmup runtime. That is because
the first implementation of the LoadPrepare phase starts the scan at
seqno=0. Thus, the full Warmup performs two full scans of the entire
seqno-index. This patch addresses the issue. We also do not load any
prepares when HCS == HPS as every prepare has been completed.

Change-Id: Iaf310fe5d7f508303d05d1f5a9632b9dfcf368a7
Reviewed-on: http://review.couchbase.org/113267
Reviewed-by: James Harrison <[email protected]>
Reviewed-by: Dave Rigby <[email protected]>
Tested-by: Build Bot <[email protected]>
  • Loading branch information
BenHuddleston authored and daverigby committed Aug 19, 2019
1 parent a2b7748 commit 96ed3eb
Show file tree
Hide file tree
Showing 8 changed files with 125 additions and 26 deletions.
80 changes: 61 additions & 19 deletions engines/ep/src/ep_bucket.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1366,14 +1366,28 @@ void EPBucket::rollbackUnpersistedItems(VBucket& vb, int64_t rollbackSeqno) {
// At the end of the scan, all outstanding Prepared items (which did not
// have a Commit persisted to disk) will be registered with the Durability
// Monitor.
void EPBucket::loadPreparedSyncWrites(
EPBucket::LoadPreparedSyncWritesResult EPBucket::loadPreparedSyncWrites(
folly::SharedMutex::WriteHolder& vbStateLh, VBucket& vb) {
/// Disk load callback for scan.
struct LoadSyncWrites : public StatusCallback<GetValue> {
LoadSyncWrites(EPVBucket& vb) : vb(vb) {
LoadSyncWrites(EPVBucket& vb, uint64_t highPreparedSeqno)
: vb(vb), highPreparedSeqno(highPreparedSeqno) {
}

void callback(GetValue& val) override {
// Abort the scan early if we have passed the HPS as we don't need
// to load any more prepares.
if (val.item->getBySeqno() >
static_cast<int64_t>(highPreparedSeqno)) {
// ENOMEM may seem like an odd status code to abort the scan but
// disk backfill to a given seqno also returns ENGINE_ENOMEM
// when it has received all the seqnos that it cares about to
// abort the scan.
setStatus(ENGINE_ENOMEM);
return;
}

itemsVisited++;
if (val.item->isPending()) {
// Pending item which was not aborted (deleted). Add to
// outstanding Prepare map.
Expand All @@ -1392,6 +1406,13 @@ void EPBucket::loadPreparedSyncWrites(

EPVBucket& vb;

// HPS after which we can abort the scan
uint64_t highPreparedSeqno = std::numeric_limits<uint64_t>::max();

// Number of items our callback "visits". Used to validate how many
// items we look at when loading SyncWrites.
uint64_t itemsVisited = 0;

/// Map of Document key -> outstanding (not yet Committed / Aborted)
/// prepares.
std::unordered_map<StoredDocKey, std::unique_ptr<Item>>
Expand All @@ -1401,18 +1422,39 @@ void EPBucket::loadPreparedSyncWrites(
auto& epVb = dynamic_cast<EPVBucket&>(vb);
const auto start = std::chrono::steady_clock::now();

// @TODO MB-34017: We can optimise this by starting the scan at the
// high_committed_seqno - all earlier prepares would have been committed
// (or were aborted) and only scanning up to the high prepared seqno.
uint64_t startSeqno = 0;

// Get the kvStore. Using the RW store as the rollback code that will call
// this function will modify vbucket_state that will only be reflected in
// RW store. For warmup case, we don't allow writes at this point in time
// anyway.
auto* kvStore = getRWUnderlyingByShard(epVb.getShard()->getId());

auto storageCB = std::make_shared<LoadSyncWrites>(epVb);
// Need the HPS/HCS so the DurabilityMonitor can be fully resumed
auto vbState = kvStore->getVBucketState(epVb.getId());
if (!vbState) {
throw std::logic_error("EPBucket::loadPreparedSyncWrites: processing " +
epVb.getId().to_string() +
", but found no vbucket_state");
}

// Insert all outstanding Prepares into the VBucket (HashTable &
// DurabilityMonitor).
std::vector<queued_item> prepares;
if (vbState->highPreparedSeqno == vbState->highCompletedSeqno) {
// We don't need to warm up anything for this vBucket as all of our
// prepares have been completed, but we do need to create the DM
// with our vbucket_state.
epVb.loadOutstandingPrepares(vbStateLh, *vbState, std::move(prepares));
// No prepares loaded
return {0, 0};
}

// We optimise this step by starting the scan at the seqno following the
// High Completed Seqno. By definition, all earlier prepares have been
// completed (Committed or Aborted).
const uint64_t startSeqno = vbState->highCompletedSeqno + 1;

auto storageCB =
std::make_shared<LoadSyncWrites>(epVb, vbState->highPreparedSeqno);

// Don't expect to find anything already in the HashTable, so use
// NoLookupCallback.
Expand All @@ -1434,11 +1476,17 @@ void EPBucket::loadPreparedSyncWrites(
EP_LOG_CRITICAL(
"EPBucket::loadPreparedSyncWrites: scanCtx is null for {}",
epVb.getId());
return;
// No prepares loaded
return {0, 0};
}

auto scanResult = kvStore->scan(scanCtx);
Expects(scanResult == scan_success);

// If we abort our scan early due to reaching the HPS then the scan result
// will be failure but we will have scanned correctly.
if (storageCB->getStatus() != ENGINE_ENOMEM) {
Expects(scanResult == scan_success);
}

kvStore->destroyScanContext(scanCtx);

Expand All @@ -1451,7 +1499,7 @@ void EPBucket::loadPreparedSyncWrites(

// Insert all outstanding Prepares into the VBucket (HashTable &
// DurabilityMonitor).
std::vector<queued_item> prepares;
prepares.reserve(storageCB->outstandingPrepares.size());
for (auto& prepare : storageCB->outstandingPrepares) {
prepares.emplace_back(std::move(prepare.second));
}
Expand All @@ -1461,15 +1509,9 @@ void EPBucket::loadPreparedSyncWrites(
return a->getBySeqno() < b->getBySeqno();
});

// Need the HPS/HCS so the DurabilityMonitor can be fully resumed
auto vbState = kvStore->getVBucketState(epVb.getId());
if (!vbState) {
throw std::logic_error("EPBucket::loadPreparedSyncWrites: processing " +
epVb.getId().to_string() +
", but found no vbucket_state");
}

auto numPrepares = prepares.size();
epVb.loadOutstandingPrepares(vbStateLh, *vbState, std::move(prepares));
return {storageCB->itemsVisited, numPrepares};
}

ValueFilter EPBucket::getValueFilterForCompressionMode() {
Expand Down
4 changes: 2 additions & 2 deletions engines/ep/src/ep_bucket.h
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,8 @@ class EPBucket : public KVBucket {

void rollbackUnpersistedItems(VBucket& vb, int64_t rollbackSeqno) override;

void loadPreparedSyncWrites(folly::SharedMutex::WriteHolder& vbStateLh,
VBucket& vb) override;
LoadPreparedSyncWritesResult loadPreparedSyncWrites(
folly::SharedMutex::WriteHolder& vbStateLh, VBucket& vb) override;

/**
* Returns the ValueFilter to use for KVStore scans, given the bucket
Expand Down
7 changes: 4 additions & 3 deletions engines/ep/src/ephemeral_bucket.h
Original file line number Diff line number Diff line change
Expand Up @@ -104,9 +104,10 @@ class EphemeralBucket : public KVBucket {
// No op
}

void loadPreparedSyncWrites(folly::SharedMutex::WriteHolder& vbStateLh,
VBucket& vb) override {
// No op
LoadPreparedSyncWritesResult loadPreparedSyncWrites(
folly::SharedMutex::WriteHolder& vbStateLh, VBucket& vb) override {
// No op, return 0 prepares loaded
return {0, 0};
}

void notifyNewSeqno(const Vbid vbid, const VBNotifyCtx& notifyCtx) override;
Expand Down
12 changes: 11 additions & 1 deletion engines/ep/src/kv_bucket_iface.h
Original file line number Diff line number Diff line change
Expand Up @@ -787,6 +787,14 @@ class KVBucketIface {
*/
virtual bool isGetAllKeysSupported() const = 0;

/**
* Result of the loadPreparedSyncWrites function
*/
struct LoadPreparedSyncWritesResult {
uint64_t itemsVisited = 0;
uint64_t preparesLoaded = 0;
};

protected:

/**
Expand Down Expand Up @@ -834,8 +842,10 @@ class KVBucketIface {
*
* @param vbStateLh vBucket state lock
* @param vb vBucket for which we will load SyncWrites
*
* @returns number of prepares loaded
*/
virtual void loadPreparedSyncWrites(
virtual LoadPreparedSyncWritesResult loadPreparedSyncWrites(
folly::SharedMutex::WriteHolder& vbStateLh, VBucket& vb) = 0;

// During the warmup phase we might want to enable external traffic
Expand Down
2 changes: 2 additions & 0 deletions engines/ep/src/stats.cc
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
EPStats::EPStats()
: warmedUpKeys(0),
warmedUpValues(0),
warmedUpPrepares(0),
warmupItemsVisitedWhilstLoadingPrepares(0),
warmDups(0),
warmOOM(0),
warmupMemUsedCap(0),
Expand Down
4 changes: 4 additions & 0 deletions engines/ep/src/stats.h
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,10 @@ class EPStats {
Counter warmedUpKeys;
//! Number of key-values warmed up during data loading.
Counter warmedUpValues;
//! Number of prepares warmed up.
Counter warmedUpPrepares;
//! Number of items visited whilst loading prepares
Counter warmupItemsVisitedWhilstLoadingPrepares;
//! Number of warmup failures due to duplicates
Counter warmDups;
//! Number of OOM failures at warmup time.
Expand Down
8 changes: 7 additions & 1 deletion engines/ep/src/warmup.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1223,7 +1223,13 @@ void Warmup::loadPreparedSyncWrites(uint16_t shardId) {
// for rollback.
auto& vb = *(itr->second);
folly::SharedMutex::WriteHolder vbStateLh(vb.getStateLock());
store.loadPreparedSyncWrites(vbStateLh, vb);

auto result = store.loadPreparedSyncWrites(vbStateLh, vb);
store.getEPEngine()
.getEpStats()
.warmupItemsVisitedWhilstLoadingPrepares += result.itemsVisited;
store.getEPEngine().getEpStats().warmedUpPrepares +=
result.preparesLoaded;
}

if (++threadtask_count == store.vbMap.getNumShards()) {
Expand Down
34 changes: 34 additions & 0 deletions engines/ep/tests/module_tests/evp_store_warmup_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -655,6 +655,13 @@ void DurabilityWarmupTest::testPendingSyncWrite(

// DurabilityMonitor be tracking the prepare.
EXPECT_EQ(++numTracked, vb->getDurabilityMonitor().getNumTracked());

EXPECT_EQ(numTracked,
store->getEPEngine().getEpStats().warmedUpPrepares);
EXPECT_EQ(numTracked,
store->getEPEngine()
.getEpStats()
.warmupItemsVisitedWhilstLoadingPrepares);
}
}

Expand Down Expand Up @@ -731,6 +738,13 @@ void DurabilityWarmupTest::testCommittedSyncWrite(

// DurabilityMonitor should be empty as no outstanding prepares.
EXPECT_EQ(--numTracked, vb->getDurabilityMonitor().getNumTracked());

EXPECT_EQ(numTracked,
store->getEPEngine().getEpStats().warmedUpPrepares);
EXPECT_EQ(numTracked,
store->getEPEngine()
.getEpStats()
.warmupItemsVisitedWhilstLoadingPrepares);
}
}

Expand Down Expand Up @@ -776,6 +790,11 @@ void DurabilityWarmupTest::testCommittedAndPendingSyncWrite(
setVBucketStateAndRunPersistTask(vbid, vbState);
}
resetEngineAndWarmup();
EXPECT_EQ(1, store->getEPEngine().getEpStats().warmedUpPrepares);
EXPECT_EQ(2,
store->getEPEngine()
.getEpStats()
.warmupItemsVisitedWhilstLoadingPrepares);

// Should load two items into memory - both committed and the pending value.
// Check the original committed value is inaccessible due to the pending
Expand Down Expand Up @@ -859,6 +878,11 @@ TEST_P(DurabilityWarmupTest, AbortedSyncWritePrepareIsNotLoaded) {
EXPECT_EQ(1, vb->getNumItems());
}
resetEngineAndWarmup();
EXPECT_EQ(0, store->getEPEngine().getEpStats().warmedUpPrepares);
EXPECT_EQ(0,
store->getEPEngine()
.getEpStats()
.warmupItemsVisitedWhilstLoadingPrepares);

// Should load one item into memory - committed value.
auto vb = engine->getVBucket(vbid);
Expand Down Expand Up @@ -898,6 +922,11 @@ TEST_P(DurabilityWarmupTest, ReplicationTopologyMissing) {
vbid, vbstate, VBStatePersist::VBSTATE_PERSIST_WITH_COMMIT);

resetEngineAndWarmup();
EXPECT_EQ(0, store->getEPEngine().getEpStats().warmedUpPrepares);
EXPECT_EQ(0,
store->getEPEngine()
.getEpStats()
.warmupItemsVisitedWhilstLoadingPrepares);

// Check topology is empty.
auto vb = engine->getKVBucket()->getVBucket(vbid);
Expand Down Expand Up @@ -943,6 +972,11 @@ TEST_P(DurabilityWarmupTest, WarmupCommit) {
// Because we bypassed KVBucket::set the HPS/HCS will be incorrect and fail
// the pre/post warmup checker, so disable the checker for this test.
resetEngineAndWarmup().disable();
EXPECT_EQ(1, store->getEPEngine().getEpStats().warmedUpPrepares);
EXPECT_EQ(1,
store->getEPEngine()
.getEpStats()
.warmupItemsVisitedWhilstLoadingPrepares);

vb = store->getVBucket(vbid);
ASSERT_TRUE(vb);
Expand Down

0 comments on commit 96ed3eb

Please sign in to comment.