Skip to content

Commit 9da6f60

Browse files
Merge branch '6.4.tikv' of https://github.com/tikv/rocksdb into mutex
2 parents c9493c7 + f236fe4 commit 9da6f60

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+2201
-591
lines changed

CMakeLists.txt

+1
Original file line numberDiff line numberDiff line change
@@ -585,6 +585,7 @@ set(SOURCES
585585
monitoring/iostats_context.cc
586586
monitoring/perf_context.cc
587587
monitoring/perf_level.cc
588+
monitoring/perf_flag.cc
588589
monitoring/persistent_stats_history.cc
589590
monitoring/statistics.cc
590591
monitoring/thread_status_impl.cc

HISTORY.md

+7
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,14 @@
22
## Additional Improvements
33
### Public API Change
44
* DeleteRange now returns `Status::InvalidArgument` if the range's end key comes before its start key according to the user comparator. Previously the behavior was undefined.
5+
* ldb now uses options.force_consistency_checks = true by default and "--disable_consistency_checks" is added to disable it.
6+
* Removed unused structure `CompactionFilterContext`.
57

68
### New Features
79
* When user uses options.force_consistency_check in RocksDb, instead of crashing the process, we now pass the error back to the users without killing the process.
810
* Added experimental ColumnFamilyOptions::sst_partitioner_factory to define determine the partitioning of sst files. This helps compaction to split the files on interesting boundaries (key prefixes) to make propagation of sst files less write amplifying (covering the whole key space).
911
* Option `max_background_flushes` can be set dynamically using DB::SetDBOptions().
12+
* Allow `CompactionFilter`s to apply in more table file creation scenarios such as flush and recovery. For compatibility, `CompactionFilter`s by default apply during compaction. Users can customize this behavior by overriding `CompactionFilterFactory::ShouldFilterTableFileCreation()`. Picked from [facebook/rocksdb#pr8243](https://github.com/facebook/rocksdb/pull/8243).
1013

1114
### Bug Fixes
1215
* Fixed issue #6316 that can cause a corruption of the MANIFEST file in the middle when writing to it fails due to no disk space.
@@ -19,6 +22,8 @@
1922
* Fix a bug in which a snapshot read could be affected by a DeleteRange after the snapshot (#6062).
2023
* `WriteBatchWithIndex::DeleteRange` returns `Status::NotSupported`. Previously it returned success even though reads on the batch did not account for range tombstones. The corresponding language bindings now cannot be used. In C, that includes `rocksdb_writebatch_wi_delete_range`, `rocksdb_writebatch_wi_delete_range_cf`, `rocksdb_writebatch_wi_delete_rangev`, and `rocksdb_writebatch_wi_delete_rangev_cf`. In Java, that includes `WriteBatchWithIndex::deleteRange`.
2124

25+
### Performance Improvements
26+
* When gathering unreferenced obsolete files for purging, file metas associated with active versions will no longer be copied for double-check. Updated VersionBuilder to make sure each physical file is reference counted by at most one FileMetaData.
2227

2328
## 6.4.6 (10/16/2019)
2429
* Fix a bug when partitioned filters and prefix search are used in conjunction, ::SeekForPrev could return invalid for an existing prefix. ::SeekForPrev might be called by the user, or internally on ::Prev, or within ::Seek if the return value involves Delete or a Merge operand.
@@ -54,6 +59,7 @@
5459
* ldb sometimes uses a string-append merge operator if no merge operator is passed in. This is to allow users to print keys from a DB with a merge operator.
5560
* Replaces old Registra with ObjectRegistry to allow user to create custom object from string, also add LoadEnv() to Env.
5661
* Added new overload of GetApproximateSizes which gets SizeApproximationOptions object and returns a Status. The older overloads are redirecting their calls to this new method and no longer assert if the include_flags doesn't have either of INCLUDE_MEMTABLES or INCLUDE_FILES bits set. It's recommended to use the new method only, as it is more type safe and returns a meaningful status in case of errors.
62+
* LDBCommandRunner::RunCommand() to return the status code as an integer, rather than call exit() using the code.
5763

5864
### New Features
5965
* Add argument `--secondary_path` to ldb to open the database as the secondary instance. This would keep the original DB intact.
@@ -93,6 +99,7 @@
9399
* Add an option `unordered_write` which trades snapshot guarantees with higher write throughput. When used with WRITE_PREPARED transactions with two_write_queues=true, it offers higher throughput with however no compromise on guarantees.
94100
* Allow DBImplSecondary to remove memtables with obsolete data after replaying MANIFEST and WAL.
95101
* Add an option `failed_move_fall_back_to_copy` (default is true) for external SST ingestion. When `move_files` is true and hard link fails, ingestion falls back to copy if `failed_move_fall_back_to_copy` is true. Otherwise, ingestion reports an error.
102+
* Add command `list_file_range_deletes` in ldb, which prints out tombstones in SST files.
96103

97104
### Performance Improvements
98105
* Reduce binary search when iterator reseek into the same data block.

db/builder.cc

+31-7
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,25 @@ Status BuildTable(
106106
TableProperties tp;
107107

108108
if (iter->Valid() || !range_del_agg->IsEmpty()) {
109+
std::unique_ptr<CompactionFilter> compaction_filter;
110+
if (ioptions.compaction_filter_factory != nullptr &&
111+
ioptions.compaction_filter_factory->ShouldFilterTableFileCreation(
112+
reason)) {
113+
CompactionFilter::Context context;
114+
context.is_full_compaction = false;
115+
context.is_manual_compaction = false;
116+
context.column_family_id = column_family_id;
117+
context.reason = reason;
118+
compaction_filter =
119+
ioptions.compaction_filter_factory->CreateCompactionFilter(context);
120+
if (compaction_filter != nullptr &&
121+
!compaction_filter->IgnoreSnapshots()) {
122+
return Status::NotSupported(
123+
"CompactionFilter::IgnoreSnapshots() = false is not supported "
124+
"anymore.");
125+
}
126+
}
127+
109128
TableBuilder* builder;
110129
std::unique_ptr<WritableFileWriter> file_writer;
111130
// Currently we only enable dictionary compression during compaction to the
@@ -141,17 +160,21 @@ Status BuildTable(
141160
0 /*target_file_size*/, file_creation_time);
142161
}
143162

144-
MergeHelper merge(env, internal_comparator.user_comparator(),
145-
ioptions.merge_operator, nullptr, ioptions.info_log,
146-
true /* internal key corruption is not ok */,
147-
snapshots.empty() ? 0 : snapshots.back(),
148-
snapshot_checker);
163+
MergeHelper merge(
164+
env, internal_comparator.user_comparator(), ioptions.merge_operator,
165+
compaction_filter.get(), ioptions.info_log,
166+
true /* internal key corruption is not ok */,
167+
snapshots.empty() ? 0 : snapshots.back(), snapshot_checker);
149168

150169
CompactionIterator c_iter(
151170
iter, internal_comparator.user_comparator(), &merge, kMaxSequenceNumber,
152171
&snapshots, earliest_write_conflict_snapshot, snapshot_checker, env,
153172
ShouldReportDetailedTime(env, ioptions.statistics),
154-
true /* internal key corruption is not ok */, range_del_agg.get());
173+
true /* internal key corruption is not ok */, range_del_agg.get(),
174+
/*compaction=*/nullptr, compaction_filter.get(),
175+
/*shutting_down=*/nullptr,
176+
/*preserve_deletes_seqnum=*/0, /*manual_compaction_paused=*/nullptr);
177+
155178
c_iter.SeekToFirst();
156179
for (; c_iter.Valid(); c_iter.Next()) {
157180
const Slice& key = c_iter.key();
@@ -192,7 +215,8 @@ Status BuildTable(
192215
meta->fd.file_size = file_size;
193216
meta->marked_for_compaction = builder->NeedCompact();
194217
assert(meta->fd.GetFileSize() > 0);
195-
tp = builder->GetTableProperties(); // refresh now that builder is finished
218+
tp = builder
219+
->GetTableProperties(); // refresh now that builder is finished
196220
if (table_properties) {
197221
*table_properties = tp;
198222
}

db/builder.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
namespace rocksdb {
2626

2727
struct Options;
28-
struct FileMetaData;
28+
class FileMetaData;
2929

3030
class Env;
3131
struct EnvOptions;

db/c.cc

+16-1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
#include "rocksdb/merge_operator.h"
2626
#include "rocksdb/options.h"
2727
#include "rocksdb/perf_context.h"
28+
#include "rocksdb/perf_flag.h"
2829
#include "rocksdb/rate_limiter.h"
2930
#include "rocksdb/slice_transform.h"
3031
#include "rocksdb/statistics.h"
@@ -54,7 +55,6 @@ using rocksdb::ColumnFamilyHandle;
5455
using rocksdb::ColumnFamilyOptions;
5556
using rocksdb::CompactionFilter;
5657
using rocksdb::CompactionFilterFactory;
57-
using rocksdb::CompactionFilterContext;
5858
using rocksdb::CompactionOptionsFIFO;
5959
using rocksdb::Comparator;
6060
using rocksdb::CompressionType;
@@ -114,6 +114,9 @@ using rocksdb::Checkpoint;
114114
using rocksdb::TransactionLogIterator;
115115
using rocksdb::BatchResult;
116116
using rocksdb::PerfLevel;
117+
using rocksdb::EnablePerfFlag;
118+
using rocksdb::DisablePerfFlag;
119+
using rocksdb::CheckPerfFlag;
117120
using rocksdb::PerfContext;
118121
using rocksdb::MemoryUtil;
119122

@@ -534,6 +537,10 @@ rocksdb_t* rocksdb_open_as_secondary(const rocksdb_options_t* options,
534537
return result;
535538
}
536539

540+
void rocksdb_resume(rocksdb_t* db, char** errptr) {
541+
SaveError(errptr, db->rep->Resume());
542+
}
543+
537544
rocksdb_backup_engine_t* rocksdb_backup_engine_open(
538545
const rocksdb_options_t* options, const char* path, char** errptr) {
539546
BackupEngine* be;
@@ -2749,6 +2756,14 @@ void rocksdb_set_perf_level(int v) {
27492756
SetPerfLevel(level);
27502757
}
27512758

2759+
void rocksdb_enable_perf_flag(uint64_t flag) { EnablePerfFlag(flag); }
2760+
2761+
void rocksdb_disable_perf_flag(uint64_t flag) { DisablePerfFlag(flag); }
2762+
2763+
int rocksdb_check_perf_flag(uint64_t flag) {
2764+
return static_cast<int>(CheckPerfFlag(flag));
2765+
}
2766+
27522767
rocksdb_perfcontext_t* rocksdb_perfcontext_create() {
27532768
rocksdb_perfcontext_t* context = new rocksdb_perfcontext_t;
27542769
context->rep = rocksdb::get_perf_context();

db/column_family.cc

+7-7
Original file line numberDiff line numberDiff line change
@@ -738,7 +738,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
738738
bool needed_delay = write_controller->NeedsDelay();
739739

740740
if (write_stall_condition == WriteStallCondition::kStopped &&
741-
write_stall_cause == WriteStallCause::kMemtableLimit) {
741+
write_stall_cause == WriteStallCause::kMemtableLimit && !mutable_cf_options.disable_write_stall) {
742742
write_controller_token_ = write_controller->GetStopToken();
743743
internal_stats_->AddCFStats(InternalStats::MEMTABLE_LIMIT_STOPS, 1);
744744
ROCKS_LOG_WARN(
@@ -748,7 +748,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
748748
name_.c_str(), imm()->NumNotFlushed(),
749749
mutable_cf_options.max_write_buffer_number);
750750
} else if (write_stall_condition == WriteStallCondition::kStopped &&
751-
write_stall_cause == WriteStallCause::kL0FileCountLimit) {
751+
write_stall_cause == WriteStallCause::kL0FileCountLimit && !mutable_cf_options.disable_write_stall) {
752752
write_controller_token_ = write_controller->GetStopToken();
753753
internal_stats_->AddCFStats(InternalStats::L0_FILE_COUNT_LIMIT_STOPS, 1);
754754
if (compaction_picker_->IsLevel0CompactionInProgress()) {
@@ -759,7 +759,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
759759
"[%s] Stopping writes because we have %d level-0 files",
760760
name_.c_str(), vstorage->l0_delay_trigger_count());
761761
} else if (write_stall_condition == WriteStallCondition::kStopped &&
762-
write_stall_cause == WriteStallCause::kPendingCompactionBytes) {
762+
write_stall_cause == WriteStallCause::kPendingCompactionBytes && !mutable_cf_options.disable_write_stall) {
763763
write_controller_token_ = write_controller->GetStopToken();
764764
internal_stats_->AddCFStats(
765765
InternalStats::PENDING_COMPACTION_BYTES_LIMIT_STOPS, 1);
@@ -769,7 +769,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
769769
"bytes %" PRIu64,
770770
name_.c_str(), compaction_needed_bytes);
771771
} else if (write_stall_condition == WriteStallCondition::kDelayed &&
772-
write_stall_cause == WriteStallCause::kMemtableLimit) {
772+
write_stall_cause == WriteStallCause::kMemtableLimit && !mutable_cf_options.disable_write_stall) {
773773
write_controller_token_ =
774774
SetupDelay(write_controller, compaction_needed_bytes,
775775
prev_compaction_needed_bytes_, was_stopped,
@@ -784,7 +784,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
784784
mutable_cf_options.max_write_buffer_number,
785785
write_controller->delayed_write_rate());
786786
} else if (write_stall_condition == WriteStallCondition::kDelayed &&
787-
write_stall_cause == WriteStallCause::kL0FileCountLimit) {
787+
write_stall_cause == WriteStallCause::kL0FileCountLimit && !mutable_cf_options.disable_write_stall) {
788788
// L0 is the last two files from stopping.
789789
bool near_stop = vstorage->l0_delay_trigger_count() >=
790790
mutable_cf_options.level0_stop_writes_trigger - 2;
@@ -804,7 +804,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
804804
name_.c_str(), vstorage->l0_delay_trigger_count(),
805805
write_controller->delayed_write_rate());
806806
} else if (write_stall_condition == WriteStallCondition::kDelayed &&
807-
write_stall_cause == WriteStallCause::kPendingCompactionBytes) {
807+
write_stall_cause == WriteStallCause::kPendingCompactionBytes && !mutable_cf_options.disable_write_stall) {
808808
// If the distance to hard limit is less than 1/4 of the gap between soft
809809
// and
810810
// hard bytes limit, we think it is near stop and speed up the slowdown.
@@ -829,7 +829,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
829829
name_.c_str(), vstorage->estimated_compaction_needed_bytes(),
830830
write_controller->delayed_write_rate());
831831
} else {
832-
assert(write_stall_condition == WriteStallCondition::kNormal);
832+
assert(write_stall_condition == WriteStallCondition::kNormal || mutable_cf_options.disable_write_stall);
833833
if (vstorage->l0_delay_trigger_count() >=
834834
GetL0ThresholdSpeedupCompaction(
835835
mutable_cf_options.level0_file_num_compaction_trigger,

db/compaction/compaction.cc

+7
Original file line numberDiff line numberDiff line change
@@ -531,6 +531,12 @@ std::unique_ptr<CompactionFilter> Compaction::CreateCompactionFilter(
531531
return nullptr;
532532
}
533533

534+
if (!cfd_->ioptions()
535+
->compaction_filter_factory->ShouldFilterTableFileCreation(
536+
TableFileCreationReason::kCompaction)) {
537+
return nullptr;
538+
}
539+
534540
CompactionFilter::Context context;
535541
context.is_full_compaction = is_full_compaction_;
536542
context.is_manual_compaction = is_manual_compaction_;
@@ -550,6 +556,7 @@ std::unique_ptr<CompactionFilter> Compaction::CreateCompactionFilter(
550556
}
551557
}
552558
context.column_family_id = cfd_->GetID();
559+
context.reason = TableFileCreationReason::kCompaction;
553560
return cfd_->ioptions()->compaction_filter_factory->CreateCompactionFilter(
554561
context);
555562
}

db/compaction/compaction_iterator.cc

+5-4
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
// (found in the LICENSE.Apache file in the root directory).
55

66
#include "db/compaction/compaction_iterator.h"
7+
78
#include "db/snapshot_checker.h"
89
#include "port/likely.h"
910
#include "rocksdb/listener.h"
@@ -78,8 +79,8 @@ CompactionIterator::CompactionIterator(
7879
current_user_key_snapshot_(0),
7980
merge_out_iter_(merge_helper_),
8081
current_key_committed_(false),
81-
snap_list_callback_(snap_list_callback) {
82-
assert(compaction_filter_ == nullptr || compaction_ != nullptr);
82+
snap_list_callback_(snap_list_callback),
83+
level_(compaction_ == nullptr ? 0 : compaction_->level()) {
8384
assert(snapshots_ != nullptr);
8485
bottommost_level_ =
8586
compaction_ == nullptr ? false : compaction_->bottommost_level();
@@ -121,7 +122,7 @@ void CompactionIterator::Next() {
121122
key_ = merge_out_iter_.key();
122123
value_ = merge_out_iter_.value();
123124
bool valid_key __attribute__((__unused__));
124-
valid_key = ParseInternalKey(key_, &ikey_);
125+
valid_key = ParseInternalKey(key_, &ikey_);
125126
// MergeUntil stops when it encounters a corrupt key and does not
126127
// include them in the result, so we expect the keys here to be valid.
127128
assert(valid_key);
@@ -176,7 +177,7 @@ void CompactionIterator::InvokeFilterIfNeeded(bool* need_skip,
176177
{
177178
StopWatchNano timer(env_, report_detailed_time_);
178179
filter = compaction_filter_->FilterV3(
179-
compaction_->level(), filter_key, seqno, value_type, value_,
180+
level_, filter_key, seqno, value_type, value_,
180181
&compaction_filter_value_, compaction_filter_skip_until_.rep());
181182
iter_stats_.total_filter_time +=
182183
env_ != nullptr && report_detailed_time_ ? timer.ElapsedNanos() : 0;

db/compaction/compaction_iterator.h

+2
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,8 @@ class CompactionIterator {
275275
// number of distinct keys processed
276276
size_t num_keys_ = 0;
277277

278+
const int level_;
279+
278280
bool IsShuttingDown() {
279281
// This is a best-effort facility, so memory_order_relaxed is sufficient.
280282
return shutting_down_ && shutting_down_->load(std::memory_order_relaxed);

db/compaction/compaction_picker_test.cc

+6-7
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,6 @@ class CompactionPickerTest : public testing::Test {
9696
f->fd.largest_seqno = largest_seq;
9797
f->compensated_file_size =
9898
(compensated_file_size != 0) ? compensated_file_size : file_size;
99-
f->refs = 0;
10099
vstorage_->AddFile(level, f);
101100
files_.emplace_back(f);
102101
file_map_.insert({file_number, {f, level}});
@@ -369,8 +368,8 @@ TEST_F(CompactionPickerTest, LevelTriggerDynamic4) {
369368
mutable_cf_options_.max_bytes_for_level_multiplier = 10;
370369
NewVersionStorage(num_levels, kCompactionStyleLevel);
371370
Add(0, 1U, "150", "200");
372-
Add(num_levels - 1, 3U, "200", "250", 300U);
373-
Add(num_levels - 1, 4U, "300", "350", 3000U);
371+
Add(num_levels - 1, 2U, "200", "250", 300U);
372+
Add(num_levels - 1, 3U, "300", "350", 3000U);
374373
Add(num_levels - 1, 4U, "400", "450", 3U);
375374
Add(num_levels - 2, 5U, "150", "180", 300U);
376375
Add(num_levels - 2, 6U, "181", "350", 500U);
@@ -575,7 +574,7 @@ TEST_F(CompactionPickerTest, CompactionPriMinOverlapping2) {
575574
Add(2, 8U, "201", "300",
576575
60000000U); // Overlaps with file 28, 29, total size 521M
577576

578-
Add(3, 26U, "100", "110", 261000000U);
577+
Add(3, 25U, "100", "110", 261000000U);
579578
Add(3, 26U, "150", "170", 261000000U);
580579
Add(3, 27U, "171", "179", 260000000U);
581580
Add(3, 28U, "191", "220", 260000000U);
@@ -1091,7 +1090,7 @@ TEST_F(CompactionPickerTest, EstimateCompactionBytesNeeded1) {
10911090
// Size ratio L4/L3 is 9.9
10921091
// After merge from L3, L4 size is 1000900
10931092
Add(4, 11U, "400", "500", 999900);
1094-
Add(5, 11U, "400", "500", 8007200);
1093+
Add(5, 12U, "400", "500", 8007200);
10951094

10961095
UpdateVersionStorageInfo();
10971096

@@ -1404,8 +1403,8 @@ TEST_F(CompactionPickerTest, IsTrivialMoveOn) {
14041403

14051404
Add(3, 5U, "120", "130", 7000U);
14061405
Add(3, 6U, "170", "180", 7000U);
1407-
Add(3, 5U, "220", "230", 7000U);
1408-
Add(3, 5U, "270", "280", 7000U);
1406+
Add(3, 7U, "220", "230", 7000U);
1407+
Add(3, 8U, "270", "280", 7000U);
14091408
UpdateVersionStorageInfo();
14101409

14111410
std::unique_ptr<Compaction> compaction(level_compaction_picker.PickCompaction(

0 commit comments

Comments
 (0)