Skip to content

Conversation

@maxtomassi
Copy link

What is the issue

A repair task would hang forever if a timeout for validate or sync request is reached. This was happening because the repair initiator (the repair service in the primary region) would consider the peer repair service as not supporting timeouts, and therefore it would not fail the repair task, leaving it hanging. This behavior is due to the way C* used to detect if a peer supports repair timeouts, which is by checking the C* version of the peer in the Nodes.peers() table. In CNDB, though, the repair services (or external services in general) are not added to the peers as they are not handled by the ServiceTracker.

What does this PR fix and why was it fixed

Add system property to always consider the remote peer supporting repair message timeouts.
Right now the peer's version is checked to figure out if timeouts are supported, but that doesn't work in CNDB, as repair services are not added to the Nodes.peers() table, so there's no clear way to check what version a remote peer is running on. The newly introduced system property allows to skip this version check and always consider timeouts supported by the remote peer.

…porting repair message timeouts.

Right now the peer's version is checked to figure out if timeouts are supported, but that doesn't work in CNDB,
as repair services are not added to the Nodes.peers() table, so there's no clear way to check what version a
remote peer is running on. The newly introduced system property allows to skip this version check and always
consider timeouts supported by the remote peer.
@github-actions
Copy link

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2185 rejected by Butler


1 regressions found
See build details here


Found 1 new test failures

Test Explanation Runs Upstream
o.a.c.index.sai.cql.VectorCompaction100dTest.testOneToManyCompaction[dc false] NEW 🔴 0 / 19

Found 1 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants