Skip to content

Conversation

@adelapena
Copy link

@adelapena adelapena commented Dec 11, 2025

What is the issue

CNDB-15260 added SAI-specific execution information to the log reports produced for slow queries, and CNDB-15260 will soon add generic execution information for all non-SAI read commands. These details are produced only for queries that are reported as slow but successful by MonitoringTask. However, MonitoringTask also produces log reports for queries that are slow enough to be aborted. We should probably also add execution information to the log reports for aborted queries.

What does this PR fix and why was it fixed

Adds execution information to the log reports for aborted queries. The extension of the log messages is identical to what we did for slow successful queries in CNDB-15260 and CNDB-15260, but applied to aborted queries:

2025-12-11 16:00:09,398 MonitoringTask.java:144 - 1 operations timed out in the last 112 msecs:
<SELECT * FROM distributed_test_keyspace.t_88 ALLOW FILTERING>, total time 211 msec, timeout 100 msec/cross-node
  Fetched/returned/tombstones:
    partitions: 1/1/0
    rows: 2/2/0
2025-12-11 16:00:12,871 MonitoringTask.java:144 - 1 operations timed out in the last 104 msecs:
<SELECT * FROM distributed_test_keyspace.t_88 WHERE s = ? ALLOW FILTERING>, total time 245 msec, timeout 100 msec/cross-node
  SAI slow query metrics:
    sstablesHit: 3
    segmentsHit: 3
    keysFetched: 10
    partitionsFetched: 1
    partitionsReturned: 1
    partitionTombstonesFetched: 1
    rowsFetched: 3
    rowsReturned: 3
    rowTombstonesFetched: 4
    trieSegmentsHit: 0
    bkdPostingListsHit: 3
    bkdSegmentsHit: 3
    bkdPostingsSkips: 0
    bkdPostingsDecodes: 0
    triePostingsSkips: 0
    triePostingsDecodes: 0
    annGraphSearchLatencyNanos: 0
  SAI slow query plan:
    Limit 2147483647 (rows: 50.0, cost/row: 101.6, cost: 4500.0..9582.0)
     └─ Filter s = ? (sel: 1.000000000) (rows: 50.0, cost/row: 101.6, cost: 4500.0..9582.0)
         └─ Fetch (rows: 50.0, cost/row: 101.6, cost: 4500.0..9582.0)
             └─ NumericIndexScan of sai_idx (sel: 0.500000000, step: 1.0) (keys: 50.0, cost/key: 0.1, cost: 4500.0..4505.0)
                predicate: Expression{name: s, op: EQ, lower: (0, true), upper: (0, true), exclusions: []}

This also renames the CassandraRelevantProperties related to monitoring, to align them with ASF/CC5. The renames are:

  • SLOW_QUERY_LOG_MONITORING_REPORT_INTERVAL_IN_MS -> MONITORING_REPORT_INTERVAL_MS
  • SLOW_QUERY_LOG_MONITORING_MAX_OPERATIONS -> MONITORING_MAX_OPERATIONS
  • SLOW_QUERY_LOG_EXECUTION_INFO_ENABLED -> MONITORING_EXECUTION_INFO_ENABLED
  • SAI_SLOW_QUERY_LOG_EXECUTION_INFO_ENABLED -> SAI_MONITORING_EXECUTION_INFO_ENABLED

This naming also makes more sense considering that monitoring is applied to both slow and aborted queries.

Please note that the real underlying system properties are unchanged, and this only renames their enum names.
All four enums are not present in any CC release yet.

@adelapena adelapena self-assigned this Dec 11, 2025
@github-actions
Copy link

github-actions bot commented Dec 11, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@adelapena adelapena force-pushed the CNDN-16237-main-aborted-queries-execution-info branch from 8bfa5fc to ef4dfe6 Compare December 11, 2025 17:22
@sonarqubecloud
Copy link

Copy link

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - test coverage, CI results, code-wise.
I would probably add some small notes to the observability/release notes section:

  • how much the logs are growing with this addition - we made similar back of the envelope calculations in the other ticket, so good to mention
  • how these logs can be used maybe?
  • we need to list the new config properties

Good call on the alignment of the properties with Apache, I didn't notice those properties were pulled to CassandraRelevantProperties in the new branches.

Pending commit on the approval of the other PR which I am moving to review now too.

@adelapena adelapena changed the title CNDB-16237: Add execution info to logs about aborted querie CNDB-16237: Add execution info to logs about aborted queries Dec 12, 2025
@adelapena
Copy link
Author

Thanks for the review. I have added release/operability/observability notes for this. The calculations about the disk space are the same as for slow but not aborted queries, since the messages are identical. I have added them for the notes anyway. Regarding new properties, we are not adding anything new nor renaming the existing cassandra.(...) properties, but we are changing the http endpoints that we have recently added. We haven't released anything with those endpoints, but I guess we should merge the notes of all three tickets (16123, 15260 and 16237) into a single entry at release time.

Copy link

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should merge the notes of all three tickets (16123, 15260 and 16237) into a single entry at release time.

I suggest we remove notes from the previous tickets and update them only in the last one.

@adelapena
Copy link
Author

I suggest we remove notes from the previous tickets and update them only in the last one.

Do you mean making here a combined note about both slow and aborted queries, even if this issue is only about aborted?

@adelapena adelapena force-pushed the CNDN-16237-main-aborted-queries-execution-info branch from ef4dfe6 to 0043ca5 Compare December 15, 2025 14:12
@adelapena
Copy link
Author

I suggest we remove notes from the previous tickets and update them only in the last one.

Do you mean making here a combined note about both slow and aborted queries, even if this issue is only about aborted?

I have updated the notes for this issue to be a combination of it's real changes, CNDB-15260 and CNDB-15260. I'll drop the notes in the merged issues, with some comment pointing to this issue, when I merge this one.

@ekaterinadimitrova2
Copy link

ekaterinadimitrova2 commented Dec 15, 2025

Everything looks great on both PRs and the issue, thanks. My +1 stands

@adelapena
Copy link
Author

It seems Sonar checks have failed for the second time, running once again.

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2172 rejected by Butler


2 regressions found
See build details here


Found 2 new test failures

Test Explanation Runs Upstream
o.a.c.index.sai.cql.VectorCompaction100dTest.testCompactionWithEnoughRowsForPQAndDeleteARow[db true] NEW 🔴 0 / 19
o.a.c.index.sai.cql.VectorSiftSmallTest.testMultiSegmentBuild[ca false] REGRESSION 🔴 0 / 19

Found 2 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants