-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-13696 fix empty iterator access in BM25 search on partial SSTable #1691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Checklist before you submit for review
|
3d31438
to
04e3862
Compare
Adds tests to split data into two segments and demonstrate that the BM25 ordering is different than single table. Currently the test is broken due to a bug in BM25 implementation and this commit reproduces it.
Fixes the bug that if SSTable contains part of the data, it results in reading from an empty iterator.
04e3862
to
3c3b69c
Compare
|
✔️ Build ds-cassandra-pr-gate/PR-1691 approved by ButlerApproved by Butler |
@@ -168,6 +168,9 @@ private Cell<?> readColumn(SSTableReader sstable, PrimaryKey primaryKey) | |||
var slices = Slices.with(indexContext.comparator(), Slice.make(primaryKey.clustering())); | |||
try (var rowIterator = sstable.iterator(dk, slices, columnFilter, false, SSTableReadsListener.NOOP_LISTENER)) | |||
{ | |||
// primaryKey might not belong to this sstable, thus the iterator will be empty | |||
if (rowIterator.isEmpty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix. Adding a semi-related observation. If we end up seeing a lot of object allocations associated with the Slices
which are O(n)
for results matched in the WHERE
clause, it might be worth finding a way to use sstable.couldContain(dk)
, which checks the bloom filter, to reduce allocations for keys not in the sstable. If you look in the implementation of the iterator
method, you'll see how it already does that for us, but it's after we've already created our slice object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will not be surprised if there are more places for such improvements.
#1691) Executing a query with BM25 search and a condition on partial SSTable results in empty iterator access error. And there was no test with storing data in segments. The PR implements BM25 search tests with splitting data into two tables. This reproduced this bug, CNDB-13696, and demonstrates current confusion on the BM25 ordering result to be fixed by CNDB-13553. This PR adds a check for empty iterator created for a PK belonging to another segment. This fixes the bug of trying to get the first element of an empty iterator. (cherry picked from commit 167a98c)
#1691) Executing a query with BM25 search and a condition on partial SSTable results in empty iterator access error. And there was no test with storing data in segments. The PR implements BM25 search tests with splitting data into two tables. This reproduced this bug, CNDB-13696, and demonstrates current confusion on the BM25 ordering result to be fixed by CNDB-13553. This PR adds a check for empty iterator created for a PK belonging to another segment. This fixes the bug of trying to get the first element of an empty iterator.
What is the issue
Executing a query with BM25 search and a condition on partial SSTable results in empty iterator access error. And there was no test with storing data in segments.
What does this PR fix and why was it fixed
The PR implements BM25 search tests with splitting data into two tables. This reproduced this bug, CNDB-13696, and demonstrates current confusion on the BM25 ordering result to be fixed by CNDB-13667.
This PR adds a check for empty iterator created for a PK belonging to another segment. This fixes the bug of trying to get the first element of an empty iterator.
Fixes https://github.com/riptano/cndb/issues/13696, fixes https://github.com/riptano/cndb/issues/13667
It's based on #1688, which fixes https://github.com/riptano/cndb/issues/13671
It's planned to be two commits on top of the commit in #1688.(it's okay to squash as the test reproduces the issue)