Skip to content

Conversation

@adelapena
Copy link

@adelapena adelapena commented Dec 4, 2025

What is the issue

We don't need all the counters in QueryContext to be LongAdder, since they are only accessed by the query's thread.

They used to be plain longs until VECTOR-SEARCH-29 parallelized vector searches, making them accessed by multiple threads. However, that parallelization was removed by the BM25 patch

The system properties USE_PARALLEL_INDEX_READ and PARALLEL_INDEX_READ_NUM_THREADS are unused since BM25 removed parallel search.

What does this PR fix and why was it fixed

This PR makes the counters in QueryContext plain longs, without any concurrency control within the class.

The system properties USE_PARALLEL_INDEX_READ and PARALLEL_INDEX_READ_NUM_THREADS are removed, since they were no-op since the introduction of BM25. Since we don't reject unknown system properties, they will remain no-op.

@adelapena adelapena self-assigned this Dec 4, 2025
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

Copy link

@pkolaczk pkolaczk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The QueryContext changes looks good, but I suggest removing the not-used options instead of deprecating and optionally adding thread safety checks.


// Enables parallel index read.
@Deprecated // unused since BM25 removed parallel index read
USE_PARALLEL_INDEX_READ("cassandra.index_read.parallel", "true"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Deprecated should be used to mark things that technically still work but are going to be removed soon so to give the callers/users some time to stop using them. But it looks like we don't use this option anymore at all, and additionally, it doesn't work (flipping it doesn't change anything). I suggest removing it totally, because otherwise it only adds confusion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bit of a different experience.
We add the @Deprecated so anyone who thinks they are using it (it doesn't work anymore) can take the time to remove it and not just get broken straight.

Copy link

@ekaterinadimitrova2 ekaterinadimitrova2 Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no difference than a cassandra.yaml property (be careful with backward compatibility, breaking changes, etc). I'd say deprecate here, remove in main-5.0, to be on the safe side. WDYT? We should also document this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekaterinadimitrova2 that was my initial reasoning leading to deprecate rather than remove the properties.

However, it seems that, differently to yaml properties, unknown system properties are not rejected. So startup won't fail if we remove the properties, it will simply be no-op. We don't have any entrypoints to dynamically change these property either.

If we leave them as deprecated, it doesn't seem that there is anything to warn about the usage of a deprecated system property. So starting with our without the property will be no-op too.

Normally I would agree that removing or even silently making a property no-op is not ideal. But in our case this was already no-op before, since the introduction of BM25. I think we are not really changing anything here, just finishing the cleanup of dead code left by BM25.

Copy link

@ekaterinadimitrova2 ekaterinadimitrova2 Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it seems that, differently to yaml properties, unknown system properties are not rejected.

Yes, they are not. But if we warn - then people will know to remove them/clean/etc. Otherwise it will be weird dead code (anyone using -D in their scripts, etc) for anyone who was using them with wrong expectations? (it won't break as they are not used internally)

But in our case this was already no-op before, since the introduction of BM25.

That is a good point that most people should have realized it up to now... Let's just put a note then and clean it. No better cleaning than removing "stuff" :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a note in the the issue's release notes. It says that they were no-op, and now they are still no-op too, which makes me think we might don't need the note at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which makes me think we might don't need the note at all.

No-op on our end, but I think the note is ok - it signals people they can do some cleaning on their end if they were using the -D flag and reduce confusion and amount of flags used in their clusters.

@Deprecated // unused since BM25 removed parallel index read
USE_PARALLEL_INDEX_READ("cassandra.index_read.parallel", "true"),
@Deprecated // unused since BM25 removed parallel index read
PARALLEL_INDEX_READ_NUM_THREADS("cassandra.index_read.parallel_thread_num"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@adelapena adelapena force-pushed the CNDB-16203-main-query-context-longs branch from 0daedcc to bc700c4 Compare December 8, 2025 13:48
Copy link

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "new" test failures are timeouts we see in CI which shouldn't be related to what we do here. Maybe run the tests locally one more time before commit to ensure we are not missing anything as they are in the SAI area - just in case.


// Enables parallel index read.
USE_PARALLEL_INDEX_READ("cassandra.index_read.parallel", "true"),
PARALLEL_INDEX_READ_NUM_THREADS("cassandra.index_read.parallel_thread_num"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You may want to update the PR description to say they are removed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have updated the description of both PRs.


// Enables parallel index read.
@Deprecated // unused since BM25 removed parallel index read
USE_PARALLEL_INDEX_READ("cassandra.index_read.parallel", "true"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which makes me think we might don't need the note at all.

No-op on our end, but I think the note is ok - it signals people they can do some cleaning on their end if they were using the -D flag and reduce confusion and amount of flags used in their clusters.

We don't need all the counters in QueryContext to be LongAdder, since they are
only accessed by the query's thread.

They used to be plain longs until VECTOR-SEARCH-29 parallelized vector searches,
making them accessed by multiple threads. However, that parallelization was
removed by the BM25 patch.

The system properties USE_PARALLEL_INDEX_READ and PARALLEL_INDEX_READ_NUM_THREADS
are marked as deprecated, since they are unused since the introduction of BM25.
Metrics are generally read from the context snapshot
@adelapena adelapena force-pushed the CNDB-16203-main-query-context-longs branch from bc700c4 to adfda82 Compare December 10, 2025 11:54
@sonarqubecloud
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2153 rejected by Butler


2 regressions found
See build details here


Found 2 new test failures

Test Explanation Runs Upstream
o.a.c.index.sai.cql.VectorCompaction100dTest.testCompactionWithEnoughRowsForPQAndDeleteARow[eb false] NEW 🔴 0 / 19
o.a.c.index.sai.cql.VectorSiftSmallTest.testCompaction[db false] REGRESSION 🔴 0 / 19

Found 2 known test failures

@adelapena adelapena merged commit c093487 into main Dec 10, 2025
489 of 497 checks passed
@adelapena adelapena deleted the CNDB-16203-main-query-context-longs branch December 10, 2025 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants