Skip to content

CNDB-11666: Batch clusterings into single SAI partition post-filtering reads #1883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 22, 2025

Conversation

michaeljmarshall
Copy link
Member

What is the issue

Fixes: https://github.com/riptano/cndb/issues/11666
Ports: https://issues.apache.org/jira/browse/CASSANDRA-19497

May fix: https://github.com/riptano/cndb/issues/14822

What does this PR fix and why was it fixed

Here is a draft of porting the fix from upstream. Initial validation shows improved performance that gets much closer to the aa performance for low selectivity queries.

Test results from many different versions show that this patch gets us from ~39 qps to ~418 qps, giving us a 10x increase in throughput.

$ latte list --tag ondisk
File ─────────────────────────────────────────────────────────────────────────────────────────────   Workload   Function   Timestamp ─────────   Tags ──────────────────────────   Params   Rate   Thrpt. [req/s]   P50 [ms]   P99 [ms]
./wide.Test_Cluster.4.0.11.0-SNAPSHOT.ondisk.p128.t1.c1.20250716.124318.json                         wide.rn    hc         2025-07-16 12:42:17   ondisk, ec, cc                                              7526       15.5       36.6
./wide.Test_Cluster.5.0.5-SNAPSHOT.ondisk.5.0.p128.t1.c1.20250716.125436.json                        wide.rn    hc         2025-07-16 12:53:35   ondisk, 5.0                                                 4634       23.7       95.3
./wide.Test_Cluster.4.0.11.0-SNAPSHOT.ondisk.cc.main.aa.p128.t1.c1.20250716.131838.json              wide.rn    hc         2025-07-16 13:17:37   ondisk, cc, main, aa                                        1182      106.1      191.8
./wide.Test_Cluster.4.0.11.0-SNAPSHOT.ondisk.cc.main-with-11666.aa.p128.t1.c1.20250717.002057.json   wide.rn    hc         2025-07-17 00:19:57   ondisk, cc, main-with-11666, aa                             1105      116.2      187.0
./wide.Test_Cluster.4.0.11.0-SNAPSHOT.ondisk.cc.main-with-11666.ec.p128.t1.c1.20250717.002614.json   wide.rn    hc         2025-07-17 00:25:13   ondisk, cc, main-with-11666, ec                             6041       20.1       33.6
./wide.Test_Cluster.4.0.11.0-SNAPSHOT.ondisk.ec.cc.p128.t1.c1.20250716.124531.json                   wide.rn    lc         2025-07-16 12:44:26   ondisk, ec, cc                                                39     3147.0     3432.9
./wide.Test_Cluster.5.0.5-SNAPSHOT.ondisk.5.0.p128.t1.c1.20250716.125619.json                        wide.rn    lc         2025-07-16 12:55:18   ondisk, 5.0                                                  289      439.0      550.2
./wide.Test_Cluster.4.0.11.0-SNAPSHOT.ondisk.cc.main.aa.p128.t1.c1.20250716.131640.json              wide.rn    lc         2025-07-16 13:15:39   ondisk, cc, main, aa                                         509      243.7      372.8
./wide.Test_Cluster.4.0.11.0-SNAPSHOT.ondisk.cc.main-with-11666.aa.p128.t1.c1.20250717.001941.json   wide.rn    lc         2025-07-17 00:18:40   ondisk, cc, main-with-11666, aa                              511      241.6      360.3
./wide.Test_Cluster.4.0.11.0-SNAPSHOT.ondisk.cc.main-with-11666.ec.p128.t1.c1.20250717.002441.json   wide.rn    lc         2025-07-17 00:23:40   ondisk, cc, main-with-11666, ec                              418      299.8      388.4

Copy link

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

Copy link

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the patch looks good

we have to add unit tests (or identify existing unit tests) that cover the new "feature" and the code we touched

Comment on lines 517 to 520
// Preconditions.checkNotNull(key.partitionKey(), "Partition key must not be null");
// if (lastKey != null && key.partitionKey().equals(lastKey.partitionKey()) && key.clustering().equals(lastKey.clustering()))
// return null;
// lastKey = key;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this needs an update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this in an ambiguous state due to concerns about correctness and deduplication. Looks like we do that in fillNextSelectedKeysInPartition, so I'll remove these lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I need to follow up on this comment to make sure we're in the clear:

            // Key reads are lazy, delayed all the way to this point.
            // We don't want key.equals(lastKey) because some PrimaryKey implementations consider more than just
            // partition key and clustering for equality. This can break lastKey skipping, which is necessary for
            // correctness when PrimaryKey doesn't have a clustering (as otherwise, the same partition may get
            // filtered and considered as a result multiple times).
            // we need a non-null partitionKey here, as we want to construct a SinglePartitionReadCommand

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am satisfied that the logic is correct as is. The cases where we had issues comparing lastKey and the nextKey are no longer relevant because we get the PrimaryKey objects from the iterator are either fully qualified or are static (with empty clustering keys), and in the static case, we have an iterator over the whole partition, which is what we would have been doing previously.

There is possibly an opportunity to optimize the logic with static primary keys, but as far as I can tell, the current "error" is to read additional rows from disk, which is an acceptable error. I'm not certain, but it seems possible that upstream has a similar problem, if one exists (it might not though because their PrimaryKey objects are slightly different)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created this ticket as a follow up https://github.com/riptano/cndb/issues/14861

@michaeljmarshall
Copy link
Member Author

michaeljmarshall commented Jul 18, 2025

Marking as ready for review to run CI. That will help me figure out if #1883 (comment) is a problem, since the code from apache definitely uses the key.equals(lastKey). (Note that Apache doesn't have the partition only indexing we get with aa, which is the reason we added that comment as a part of this PR #1096 (the one that added back in aa support).)

@michaeljmarshall michaeljmarshall marked this pull request as ready for review July 18, 2025 16:30
@JeremiahDJordan
Copy link
Member

That will help me figure out if #1883 (comment) is a problem ... the one that added back in aa support

Can we tell if we are in the aa case or not, and not do this new logic if we are? This batching stuff doesn't really make sense for aa files?

@michaeljmarshall
Copy link
Member Author

That will help me figure out if #1883 (comment) is a problem ... the one that added back in aa support

Can we tell if we are in the aa case or not, and not do this new logic if we are? This batching stuff doesn't really make sense for aa files?

I think we're good to go here. You're right that the aa logic isn't a problem.

@michaeljmarshall
Copy link
Member Author

✔️ Build ds-cassandra-pr-gate/PR-1883 approved by Butler

Approved by Butler See build details here

Looks like the tests are passing here, but the github actions don't seem quite right.

Copy link

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Waiting for final @adelapena 's review

Copy link

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding more tests

…g reads

Port of CASSANDRA-19497.

Co-authored-by: Caleb Rackliffe <[email protected]>
Co-authored-by: Michael Marshall <[email protected]>
Co-authored-by: Andrés de la Peña <[email protected]>
Copy link

@cassci-bot
Copy link

✔️ Build ds-cassandra-pr-gate/PR-1883 approved by Butler


Approved by Butler
See build details here

@adelapena adelapena merged commit 81f2cf8 into main Jul 22, 2025
488 checks passed
@adelapena adelapena deleted the cndb-11666 branch July 22, 2025 15:38
driftx pushed a commit that referenced this pull request Jul 25, 2025
…g reads (#1883)

Port of CASSANDRA-19497

Co-authored-by: Caleb Rackliffe <[email protected]>
Co-authored-by: Michael Marshall <[email protected]>
Co-authored-by: Andrés de la Peña <[email protected]>
driftx pushed a commit that referenced this pull request Jul 25, 2025
…g reads (#1883)

Port of CASSANDRA-19497

Co-authored-by: Caleb Rackliffe <[email protected]>
Co-authored-by: Michael Marshall <[email protected]>
Co-authored-by: Andrés de la Peña <[email protected]>
driftx pushed a commit that referenced this pull request Jul 25, 2025
…g reads (#1883)

Port of CASSANDRA-19497

Co-authored-by: Caleb Rackliffe <[email protected]>
Co-authored-by: Michael Marshall <[email protected]>
Co-authored-by: Andrés de la Peña <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants