Skip to content

CNDB-14624: do not fail user read when speculative retry handling throws #1875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jakubzytka
Copy link

@jakubzytka jakubzytka commented Jul 15, 2025

What is the issue

When the speculative retries code throws, the whole user read fails. This is suboptimal. In particular, we do not want to fail reads that would otherwise complete from the initial replicas, even if the latency is in the upper percentiles.
Another motivation is that we would like to be able to throw an exception if an internode connection tries to connect to an unknown CNDB service to prevent #14624 from happening.
Currently, throwing such an exception would break user reads, and this is not acceptable.

What does this PR fix and why was it fixed

This change introduces a specific UnknownEndpointException that may be thrown in CNDB's snitch implementation when the internode tries to connect to an unknown service.
Additionally, we catch all the exceptions thrown during speculative read issuance. UnknownEndpointException is treated as a common case (due to races between service removal and internode communication), whereas other exceptions emit a warning but do not fail the read either.
That way, allow user reads to complete even if the speculative retry fails, and we leave a path open for CNDB to prevent opening a new connection by throwing UnknownEndpointException when necessary.

Copy link

github-actions bot commented Jul 15, 2025

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@jakubzytka jakubzytka force-pushed the cndb-14624-ignore-exceptions-during-spec-retry branch from 7824335 to a166649 Compare July 16, 2025 13:09
@jakubzytka jakubzytka requested a review from maxtomassi July 17, 2025 16:20
Copy link

@cassci-bot
Copy link

✔️ Build ds-cassandra-pr-gate/PR-1875 approved by Butler


Approved by Butler
See build details here

@jakubzytka jakubzytka requested a review from sbtourist July 23, 2025 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants