-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-11666: Batch clusterings into single SAI partition post-filtering reads #1883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Checklist before you submit for review
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the patch looks good
we have to add unit tests (or identify existing unit tests) that cover the new "feature" and the code we touched
src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java
Show resolved
Hide resolved
// Preconditions.checkNotNull(key.partitionKey(), "Partition key must not be null"); | ||
// if (lastKey != null && key.partitionKey().equals(lastKey.partitionKey()) && key.clustering().equals(lastKey.clustering())) | ||
// return null; | ||
// lastKey = key; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this needs an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this in an ambiguous state due to concerns about correctness and deduplication. Looks like we do that in fillNextSelectedKeysInPartition
, so I'll remove these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I need to follow up on this comment to make sure we're in the clear:
// Key reads are lazy, delayed all the way to this point.
// We don't want key.equals(lastKey) because some PrimaryKey implementations consider more than just
// partition key and clustering for equality. This can break lastKey skipping, which is necessary for
// correctness when PrimaryKey doesn't have a clustering (as otherwise, the same partition may get
// filtered and considered as a result multiple times).
// we need a non-null partitionKey here, as we want to construct a SinglePartitionReadCommand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am satisfied that the logic is correct as is. The cases where we had issues comparing lastKey and the nextKey are no longer relevant because we get the PrimaryKey objects from the iterator are either fully qualified or are static (with empty clustering keys), and in the static case, we have an iterator over the whole partition, which is what we would have been doing previously.
There is possibly an opportunity to optimize the logic with static primary keys, but as far as I can tell, the current "error" is to read additional rows from disk, which is an acceptable error. I'm not certain, but it seems possible that upstream has a similar problem, if one exists (it might not though because their PrimaryKey objects are slightly different)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created this ticket as a follow up https://github.com/riptano/cndb/issues/14861
Marking as ready for review to run CI. That will help me figure out if #1883 (comment) is a problem, since the code from apache definitely uses the |
Can we tell if we are in the |
I think we're good to go here. You're right that the |
Looks like the tests are passing here, but the github actions don't seem quite right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Waiting for final @adelapena 's review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding more tests
…g reads Port of CASSANDRA-19497. Co-authored-by: Caleb Rackliffe <[email protected]> Co-authored-by: Michael Marshall <[email protected]> Co-authored-by: Andrés de la Peña <[email protected]>
|
✔️ Build ds-cassandra-pr-gate/PR-1883 approved by ButlerApproved by Butler |
…g reads (#1883) Port of CASSANDRA-19497 Co-authored-by: Caleb Rackliffe <[email protected]> Co-authored-by: Michael Marshall <[email protected]> Co-authored-by: Andrés de la Peña <[email protected]>
…g reads (#1883) Port of CASSANDRA-19497 Co-authored-by: Caleb Rackliffe <[email protected]> Co-authored-by: Michael Marshall <[email protected]> Co-authored-by: Andrés de la Peña <[email protected]>
…g reads (#1883) Port of CASSANDRA-19497 Co-authored-by: Caleb Rackliffe <[email protected]> Co-authored-by: Michael Marshall <[email protected]> Co-authored-by: Andrés de la Peña <[email protected]>
What is the issue
Fixes: https://github.com/riptano/cndb/issues/11666
Ports: https://issues.apache.org/jira/browse/CASSANDRA-19497
May fix: https://github.com/riptano/cndb/issues/14822
What does this PR fix and why was it fixed
Here is a draft of porting the fix from upstream. Initial validation shows improved performance that gets much closer to the
aa
performance for low selectivity queries.Test results from many different versions show that this patch gets us from ~39 qps to ~418 qps, giving us a 10x increase in throughput.