[WIP] CNDB-15608 port CASSANDRA-18673 to reduce disk usage of row-aware indexes#2122
[WIP] CNDB-15608 port CASSANDRA-18673 to reduce disk usage of row-aware indexes#2122
Conversation
0977619 to
a95a292
Compare
|
d996761 to
ecfc957
Compare
cb56ff5 to
83d23bb
Compare
|
I run latte's read_range workflow and the result doesn't show any degradation in performance: A is the base reference performance of the current main I run also another and more complicated workflow and there the result wasn't bad, but wasn't so positive as this one: This workflow is not published yet. |
c331117 to
996a373
Compare
497e319 to
ff9c875
Compare
|
I run this nightly build and it seems that the failed tests are the same as in |
7428b38 to
f237404
Compare
Add tests to control SAI disk size for different format version and different partition sizes.
It will prevent reporting incorrect value such as 0 and also notice improvements.
It compiles and tests are working.
The test only passes on new components version.
a67c8d6 to
471cf20
Compare
|
Majority of Sonar warnings come from the existing code and not introduced in this PR. One warning is for refactoring a large method, which originates from Apache's patch. |
|
❌ Build ds-cassandra-pr-gate/PR-2122 rejected by Butler6 regressions found Found 6 new test failures
Found 7 known test failures |







It removes usage of Trie for primary key component, which requires additional adjustments. Also it might not be really acceptable.
There are many more small issue to fix, see the original issue https://github.com/riptano/cndb/issues/15608
What is the issue
Fixes https://github.com/riptano/cndb/issues/15608
What does this PR fix and why was it fixed
This port brings the patch from AFS without much changes or refactoring, i.e., I haven't simplified or refactored the original code coming from Apache, and complexity of the code remains the same as the original patch and as CC's relevant code.
The patch implements new disk format for SAI, which changes how row aware primary key map is stored in components. The primary key map is split into storing partition key map and clustering key map in separate components. Both use Key Store coming from Apache, which replaces the sorted terms structure of row aware primary key maps. As result primary keys are not stored in ordered way and partition keys are stored without token prefix to allow better compression. Clustering keys are sorted lexicographically within a partition.
KeyLookup uses different structure than SortedTerms. This required to implement ceiling and floor methods in those structures, e.g., in
LondArrayimplementations. My understanding is that ceiling and floor methods are used for sorting and ANN. This doesn't exist in Apache.Because of specific case for clustering it was necessary to propagate and store a flag if table is with clustering and clustering comparator into index components and index descriptor.
Other things got in with the port of the patch:
hasEmptyClusteringmethods withhasClustering, so it's consistently used onlyhasClustering