[WIP] CNDB-15608 port CASSANDRA-18673 to reduce disk usage of row-aware indexes #2122

k-rus · 2025-11-13T15:36:34Z

The PR is still in initial state. It compiles and majority of tests are expected to succeed.
It removes usage of Trie for primary key component, which requires additional adjustments. Also it might not be really acceptable.
There are many more small issue to fix:

Investigate and fix where Trie based API of sorted terms were used (as mentioned above)
Noticed memory leaks
Reported disk size is 0 in some cases in SAI disk size test
Double check commented or remove code
Investigate removed benchmarks
May be move IndexFileUtils as in C*
Introduce TokenOnlyPrimaryKey class as in C* (improves isTokenOnly implementation)
May be move out inner impl classes from IndexDescriptor as in C*
Port bug fixes from C*

What is the issue

Fixes https://github.com/riptano/cndb/issues/15608

What does this PR fix and why was it fixed

...

Add tests to control SAI disk size for different format version and different partition sizes.

It will prevent reporting incorrect value such as 0 and also notice improvements.

It compiles and tests are working. Many issues still exist.

Does not require to rely on nullable context.

It's part of the patch.

PartitionAwarePrimaryKeyMap implements overcomplicated `ceiling` method calling `exactRowIdOrInvertedCeiling`. This commit Simplifies PartitionAwarePrimaryKeyMap.ceiling to use the corresponding correct method from the reader directly. This can be seen as a follow up to https://github.com/datastax/cassandra/pull/1096/files#diff-c5011580ab9b0d99d9e504570c4cccb152221d3dbe62c8a956e83fce9070b380

While working on CNDB-15608, IntelliJ lint complains were noticed, which are not related to the actual changes in the patch port. Thus I fix them in this separate commit to avoid unnecessary noise while working on the actual patch port. Many of the changes are align what I have already merged earlier in other PRs. Some of the changes might not match preferences from others and I am open for discussion. The changes include: - Remove unused imports - Use the formatter of the logger instead of string concatenation - Use method instead of lambda - Remove unnecessary suppression of resource warnings - Simplify Boolean conditions - Remove unnecessary modifiers in interfaces - Fix typos - Fixing links in javadoc comments - Add static modifier to nested classes - Remove class fields when not used - Remove unnecessary throws in method signatures - Use final when recommended - Remove unused method arguments - Replace single char strings with chars - Remove unnecessary null variable initialization - Replace assert true with assert equal - Change order of assert arguments to have expected value first - Remove unnecessary explicit casting

sonarqubecloud · 2025-12-12T12:05:23Z

Quality Gate failed

Failed conditions
4 New Blocker Issues (required ≤ 1)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

cassci-bot · 2025-12-12T12:10:38Z

❌ Build ds-cassandra-pr-gate/PR-2122 rejected by Butler

7020 regressions found
See build details here

Found 7020 new test failures

Showing only first 15 new test failures

Test	Explanation	Runs	Upstream
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.features.FeaturesVersionSupportAATest-.jdk11	REGRESSION	🔵🔴	0 / 19
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.features.FeaturesVersionSupportBATest-.jdk11	REGRESSION	🔵🔴	0 / 19
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.features.FeaturesVersionSupportCATest-.jdk11	REGRESSION	🔵🔴	0 / 19
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.features.FeaturesVersionSupportDBTest-.jdk11	REGRESSION	🔵🔴	0 / 19
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.features.FeaturesVersionSupportDCTest-.jdk11	REGRESSION	🔵🔴	0 / 19
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.features.FeaturesVersionSupportEBTest-.jdk11	REGRESSION	🔵🔴	0 / 19
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.features.FeaturesVersionSupportECTest-.jdk11	REGRESSION	🔵🔴	0 / 19
junit.framework.TestSuite.org.apache.cassandra.distributed.test.sai.features.FeaturesVersionSupportEDTest-.jdk11	REGRESSION	🔵🔴	0 / 19
o.a.c.db.compaction.CompactionControllerTest.testIgnoreOverlaps	REGRESSION	🔴⚪	1 / 19
o.a.c.db.filter.IndexHintsTest.testMultipleIndexesPerColumnAndContains (compression)	REGRESSION	🔵🔴	0 / 19
o.a.c.distributed.test.sai.datamodels.QueryCellDeletionsTest.testCellDeletions[BaseDataModel]	REGRESSION	🔵🔴	0 / 19
o.a.c.distributed.test.sai.datamodels.QueryCellDeletionsTest.testCellDeletions[CompositePartitionKeyDataModel]	REGRESSION	🔵🔴	0 / 19
o.a.c.distributed.test.sai.datamodels.QueryCellDeletionsTest.testCellDeletions[CompoundKeyDataModel]	REGRESSION	🔴🔴	0 / 19
o.a.c.distributed.test.sai.datamodels.QueryCellDeletionsTest.testCellDeletions[CompoundKeyWithStaticsDataModel]	REGRESSION	🔴🔴	0 / 19
o.a.c.distributed.test.sai.datamodels.QueryRowDeletionsTest.testRowDeletions[CompoundKeyWithStaticsDataModel]	REGRESSION	🔴🔴	0 / 19

Found 246 known test failures

CNDB-15609 test SAI disk size for all versions

ef2df5d

Add tests to control SAI disk size for different format version and different partition sizes.

k-rus changed the title ~~[WIP] CNDB-15609 test SAI disk size for all versions~~ [WIP] CNDB-15608 port CASSANDRA-18673 to reduce disk usage of row-aware indexes Nov 14, 2025

k-rus added 3 commits November 19, 2025 12:52

CNDB-15609 assert lower bound on SAI disk size

b30086a

It will prevent reporting incorrect value such as 0 and also notice improvements.

CNDB-15608 initial commit with porting size reduction

2b44c9e

It compiles and tests are working. Many issues still exist.

Store comparator in IndexDescriptor

479eb82

Does not require to rely on nullable context.

k-rus force-pushed the rf-15608-port-reduce-sai-size branch 2 times, most recently from 8b58f28 to 0977619 Compare December 8, 2025 10:29

Add TokenOnlyPrimaryKey class and replace isTokenOnly

a95a292

It's part of the patch.

k-rus force-pushed the rf-15608-port-reduce-sai-size branch from 0977619 to a95a292 Compare December 8, 2025 15:43

k-rus and others added 6 commits December 10, 2025 11:34

Revert removing data members to avoid memory leaks

3672957

Fix formatting and typos

4d68cc6

Add TokenOnlyPrimaryKey class and replace isTokenOnly

bdf9159

It's part of the patch.

Remove token from asComparableBytes for RowAwarePrimaryKey

b1dd5ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] CNDB-15608 port CASSANDRA-18673 to reduce disk usage of row-aware indexes #2122

[WIP] CNDB-15608 port CASSANDRA-18673 to reduce disk usage of row-aware indexes #2122

Uh oh!

k-rus commented Nov 13, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Dec 12, 2025

Uh oh!

cassci-bot commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP] CNDB-15608 port CASSANDRA-18673 to reduce disk usage of row-aware indexes #2122

Are you sure you want to change the base?

[WIP] CNDB-15608 port CASSANDRA-18673 to reduce disk usage of row-aware indexes #2122

Uh oh!

Conversation

k-rus commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the issue

What does this PR fix and why was it fixed

Uh oh!

sonarqubecloud bot commented Dec 12, 2025

Quality Gate failed

Uh oh!

cassci-bot commented Dec 12, 2025

❌ Build ds-cassandra-pr-gate/PR-2122 rejected by Butler

Found 7020 new test failures

Found 246 known test failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

k-rus commented Nov 13, 2025 •

edited

Loading