CNDB-15683: Fix incorrect results when querying mixed AA and EC indexes #2066

pkolaczk · 2025-10-15T15:25:04Z

When row-aware and non-row-aware indexes are mixed, we now check
the clustering index filter for all the keys that have clustering
information, i.e. keys coming from the row-aware
indexes. Earlier that check was accidentally disabled
if at least one non-row-aware index was used by the query.
That could cause retrieving rows that do not match
the clustering condition of the query.

github-actions · 2025-10-15T15:25:20Z

michaeljmarshall · 2025-10-15T16:06:51Z

src/java/org/apache/cassandra/index/sai/plan/QueryController.java

     */
    public boolean selects(PrimaryKey key)
    {
-        return !indexFeatureSet.isRowAware() ||


It would seem the bug was introduced when we switched the makeFilter methods in the QueryController as part of #1883. An alternate solution is to use the indexFeatureSet.isRowAware() to determine that we should use the regular clusteringIndexFilter instead of one made up of only the primary keys coming from the iterator.

I agree that removing this check works and that it might be slightly more efficient in some cases, but I worry that it leaves the two makeFilter methods having different sets of logic.

I agree that using isRowAware in the makeFilter method also works and that was the first thing I tried.

But, there is a caveat - I argue we really don't want this check here anyways. It increases the complexity of the code by introducing additional code path and creates a hidden dependency between selects and the particular filter created by makeFilter. So we have two distant methods where both must take a correct path for the code to be correct. If one does the wrong thing out of sync from another - then the results are incorrect. Also, why would we even want to further process any keys that don't match the clustering filter (regardless of AA / non-AA)?

So here by removing this check I made this method do just one thing and decoupled it from the particular filter used.

Earlier it had two behaviors different between row-aware and non row-aware - this method either checked the filter or not, depending on the "mode" the query was running in. In general such long-distant dependencies make code hard to reason about and introduce subtle bugs. In general I'll insist on avoiding such a programming style, where one method does something partially in hope some other code called some time later fixes or somehow tolerates that.

And btw, the fact we have some inconsistency between the two variants of makeFilter where one does the isRowAware check and another one doesn't is something that concerns me as well. Why do we even have two makeFilter variants? Why do we even need to build our custom filter that filters by keys from the index?

So it's not the question of whether we want this fix or the makeFilter fix, but rather do we want to harden / simplify makeFilter in addition to this?

I agree with your reasoning, and I am good with adding a follow up to clean up this part of the code base.

test/unit/org/apache/cassandra/index/sai/cql/NumericIndexMixedVersionTest.java

When row-aware and non-row-aware indexes are mixed, we now check the clustering index filter for all the keys that have clustering information, i.e. keys coming from the row-aware indexes. Earlier that check was accidentally disabled if at least one non-row-aware index was used by the query. That could cause retrieving rows that do not match the clustering condition of the query.

sonarqubecloud · 2025-10-17T10:22:53Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-10-17T10:26:59Z

✔️ Build ds-cassandra-pr-gate/PR-2066 approved by Butler

Approved by Butler
See build details here

eolivelli · 2025-10-17T10:44:09Z

test/unit/org/apache/cassandra/index/sai/cql/NumericIndexMixedVersionTest.java

+
+        flush(); // Force to sstable with EC version
+
+        beforeAndAfterFlush(() -> {


can we add beforeAndAfterFlushAndCompact ? we recently had problems because we did not test indexes after compaction
I think that it does no harm
I expect that compaction will probably rewrite all the indexes to the same version, that's fine

I missed this comment but now I can see the branch got merged. Yes, it wouldn't do any harm.

…es (#2066) When row-aware and non-row-aware indexes are mixed, we now check the clustering index filter for all the keys that have clustering information, i.e. keys coming from the row-aware indexes. Earlier that check was accidentally disabled if at least one non-row-aware index was used by the query. That could cause retrieving rows that do not match the clustering condition of the query.

CNDB-15683: Fix incorrect results when querying mixed AA and EC indexes This commit fixes multiple issues with KeyRangeIterator implementations occasionally skipping or emitting duplicate keys when working on a mix of primary keys with empty / non-empty clusterings. This situation is possible while scanning tables with static columns or when some indexes are partition-aware (e.g. version AA) and others have been updated to a row-aware version (e.g. DC or EC). Due to those bugs, users could get incorrect results from SAI queries, e.g. results containing duplicated rows, duplicated partitions or even missing rows. The commit introduces extensive randomized property-based tests for KeyRangeUnionIterator and KeyIntersectionIterator. Previously, the tests did not test for keys with mixed empty/non-empty clusterings. Changes in KeyRangeUnionIterator: KeyRangeUnionIterator merges streams of primary keys in such a way that duplicates are removed. Unfortunately it does not properly account for the fact that if a key with an empty clustering meets a key with a non-empty clustering and the same partition key, we must always return the key with an empty clustering. A key with an empty clustering will always fetch the rows matched by any specific row key for the same partition, but the reverse is not true. The iterator implementation has been modified to always pick the key that matches more rows - a key with empty clustering wins over a key with non-empty clustering. Additionally, once a key with an empty clustering is emitted, no more keys in that partition are emitted. Changes in KeyRangeIntersectionIterator: Due to a very similar problem like in KeyRangeUnionIterator, KeyRangeIntersectionIterator could return either too few or too many keys, when keys with empty clusterings and keys with non-empty clusterings were present in the input key streams. In particular consider 2 input streams A and B with the following keys: A: 0: (1, Clustering.EMPTY) B: 0: (1, 1) 1: (1, 2) Key A.0 matches the whole partition 1. Therefore, the correct result of intersection are both keys of stream B. Unfortunately, the algorithm before this patch would advance both A and B iterators when emitting the first matching key. At the beginning of the second step, the iterator A would be already exhausted and no more keys would be produced. Finally key B.1 would be missing from the results. This patch fixes it by introducing two changes to the intersection algorithm: 1. A key with non-empty clustering wins over a key with empty clustering and same partition. 2. The selected highest key is not consumed while searching for the highest matching key, but that happens only after the search loop finds a match. Then we have more information which iterators would be moved to the next item. Iterators positioned at a key with an empty clustering can be advanced only after we run out of keys with non-empty clustering in the same partition or if there are no other keys with non-empty clustering. This patch also fixes another issue where we could return a less-specific key matching a full partition instead of a key matching one row: A: 0: (1, Clustering.EMPTY) B: 0: (1, 1) In that case the iterator returned a key with empty clustering, which would result in fetching and postfiltering many unnecessary rows. CNDB-15683: Fix incorrect results when querying mixed AA and EC indexes (#2066) When row-aware and non-row-aware indexes are mixed, we now check the clustering index filter for all the keys that have clustering information, i.e. keys coming from the row-aware indexes. Earlier that check was accidentally disabled if at least one non-row-aware index was used by the query. That could cause retrieving rows that do not match the clustering condition of the query. Rebase notes: - includes CNDB-15683

pkolaczk requested review from ekaterinadimitrova2 and k-rus October 15, 2025 15:25

michaeljmarshall reviewed Oct 15, 2025

View reviewed changes

ekaterinadimitrova2 reviewed Oct 15, 2025

View reviewed changes

test/unit/org/apache/cassandra/index/sai/cql/NumericIndexMixedVersionTest.java Outdated Show resolved Hide resolved

pkolaczk force-pushed the c15683-check-clustering-filter-for-mixed-AA-EC branch from 54fb5d7 to 0fdf370 Compare October 17, 2025 09:26

pkolaczk force-pushed the c15683-check-clustering-filter-for-mixed-AA-EC branch from 0fdf370 to d326725 Compare October 17, 2025 09:28

eolivelli reviewed Oct 17, 2025

View reviewed changes

michaeljmarshall approved these changes Oct 17, 2025

View reviewed changes

eolivelli merged commit e5a32d3 into main Oct 18, 2025
491 checks passed

eolivelli deleted the c15683-check-clustering-filter-for-mixed-AA-EC branch October 18, 2025 09:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CNDB-15683: Fix incorrect results when querying mixed AA and EC indexes #2066

CNDB-15683: Fix incorrect results when querying mixed AA and EC indexes #2066

Uh oh!

pkolaczk commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025 •

edited by pkolaczk

Loading

Uh oh!

michaeljmarshall Oct 15, 2025

Uh oh!

pkolaczk Oct 16, 2025 •

edited

Loading

Uh oh!

michaeljmarshall Oct 17, 2025

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 17, 2025

Uh oh!

cassci-bot commented Oct 17, 2025

Uh oh!

eolivelli Oct 17, 2025

Uh oh!

pkolaczk Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants


		flush(); // Force to sstable with EC version

		beforeAndAfterFlush(() -> {

CNDB-15683: Fix incorrect results when querying mixed AA and EC indexes #2066

CNDB-15683: Fix incorrect results when querying mixed AA and EC indexes #2066

Uh oh!

Conversation

pkolaczk commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025 • edited by pkolaczk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist before you submit for review

Uh oh!

michaeljmarshall Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

pkolaczk Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaeljmarshall Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 17, 2025

Quality Gate passed

Uh oh!

cassci-bot commented Oct 17, 2025

✔️ Build ds-cassandra-pr-gate/PR-2066 approved by Butler

Uh oh!

eolivelli Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

pkolaczk Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

github-actions bot commented Oct 15, 2025 •

edited by pkolaczk

Loading

pkolaczk Oct 16, 2025 •

edited

Loading