Skip to content

Conversation

@adelapena
Copy link

@adelapena adelapena commented Dec 2, 2025

What is the issue

CNDB-15260 added SAI-specific information to the log reports produced for slow queries. This execution information contains, among others, the number of fetched and returned partitions, rows and tombstones. However, the logging reports for non-SAI slow queries don't have a similar information about the number of filtered rows, tombstones, etc.

What does this PR fix and why was it fixed

This PR adds information about fetched/returned partitions/rows/tombstones to the logging reports for non-SAI slow queries. The reports look like:

DEBUG [node2_isolatedExecutor:1] node2 2025-12-02 18:00:14,524 MonitoringTask.java:165 - 1 operations were slow in the last 211 msecs:
<SELECT * FROM distributed_test_keyspace.t_0 ALLOW FILTERING>, time 207 msec - slow timeout 100 msec/cross-node
  Fetched/returned/tombstones:
    partitions: 9/9/1
    rows: 90/90/0

Whereas without this feature it would miss the three last lines:

DEBUG [node2_isolatedExecutor:1] node2 2025-12-02 18:00:14,524 MonitoringTask.java:165 - 1 operations were slow in the last 211 msecs:
<SELECT * FROM distributed_test_keyspace.t_0 ALLOW FILTERING>, time 207 msec - slow timeout 100 msec/cross-node

The format of the log messages tries to save disk space while keeping some readability.

There is a system property named cassandra.monitoring_execution_info_enabled to disable this feature, which is enabled by default.

@adelapena adelapena self-assigned this Dec 2, 2025
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@adelapena adelapena marked this pull request as draft December 2, 2025 18:22
@adelapena adelapena force-pushed the CNDB-16123-main-slow-query-logger branch from 1a79988 to 667ae3b Compare December 3, 2025 17:37
@adelapena adelapena marked this pull request as ready for review December 3, 2025 17:37
@adelapena
Copy link
Author

adelapena commented Dec 4, 2025

I've included a small microbenchmark, these are the results:

Benchmark                                       (enabled)  Mode  Cnt        Score       Error  Units
ReadCommandExecutionInfoBench.executionInfo         false  avgt   10      121.212 ±    51.776  ns/op
ReadCommandExecutionInfoBench.executionInfo          true  avgt   10      297.554 ±    55.089  ns/op
ReadCommandExecutionInfoBench.logSlowOperation      false  avgt   10    12555.262 ±   546.290  ns/op
ReadCommandExecutionInfoBench.logSlowOperation       true  avgt   10    12027.511 ±  1497.459  ns/op
ReadCommandExecutionInfoBench.runFastOperation      false  avgt   10  1988358.650 ± 21653.687  ns/op
ReadCommandExecutionInfoBench.runFastOperation       true  avgt   10  1862303.236 ± 16838.192  ns/op

The cost of generating the extended execution info is 3x greater, but it's still very fast and dilutes into nothing when added to slow queries that have taken at least 500ms, even more than in CNDB-15260 given that the reports are much smaller than with SAI queries. Regarding the cost of tracking the number of fetched/returned/deleted partitions and rows, which happens on every query even if it's not slow, it seems to get lost in the benchmark error and I cannot see a noticeable impact.

…ands

Enabled by default, can be disabled with property cassandra.monitoring_execution_info_enabled
It is always accessed by the same ReadStage tread, which puts it in the BlockingQueue consumed by the monitoring thread
@adelapena adelapena force-pushed the CNDB-16123-main-slow-query-logger branch from 3253559 to 8b59023 Compare December 5, 2025 12:50
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 5, 2025

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2147 rejected by Butler


2 regressions found
See build details here


Found 2 new test failures

Test Explanation Runs Upstream
o.a.c.index.sai.cql.VectorCompaction100dTest.testZeroOrOneToManyCompaction[dc true] NEW 🔴 0 / 19
o.a.c.index.sai.cql.VectorSiftSmallTest.testCompaction[ca false] REGRESSION 🔴 0 / 19

Found 2 known test failures

@adelapena adelapena merged commit 86de53e into main Dec 8, 2025
490 of 497 checks passed
@adelapena adelapena deleted the CNDB-16123-main-slow-query-logger branch December 8, 2025 13:22
@adelapena
Copy link
Author

Thanks for the review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants