-
Notifications
You must be signed in to change notification settings - Fork 8
[DNM] Flags for disabling read repairs #641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rh/hotfix-coerce-CL-repair
Are you sure you want to change the base?
[DNM] Flags for disabling read repairs #641
Conversation
// wait for the repair writes to be acknowledged, to minimize impact on any replica that's | ||
// behind on writes in case the out-of-sync row is read multiple times in quick succession | ||
FBUtilities.waitOnFutures(resolver.repairResults, DatabaseDescriptor.getWriteRpcTimeout()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that flag is set, do we need to disable waiting on repair results in getRangeSlice
as well?
FBUtilities.waitOnFutures(repairResponses, DatabaseDescriptor.getWriteRpcTimeout()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth updating the metrics marking as well ReadRepairMetrics.repairedBlocking
@@ -109,6 +110,11 @@ public Row resolve() throws DigestMismatchException | |||
*/ | |||
public static List<AsyncOneResponse> scheduleRepairs(ColumnFamily resolved, String keyspaceName, DecoratedKey key, List<ColumnFamily> versions, List<InetAddress> endpoints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are other parts of the code that do async read repairs along the write path:
public void run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not convinced that disabling async read repairs is even desirable at this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, I think this actually goes back into RowDataResolver::resolve
code path and from a quick lookup in the code MessagingService.Verb.READ_REPAIR
is only leveraged in scheduleRepairs
. So we would effectively be disabling all read repairs on reads or writes (which makes me lean towards you and Lyuben's suggestion to be incremental more, considering we seem to already be throwing around a bunch of async read repairs when we write)
Co-authored-by: Daniel Guo <[email protected]>
FBUtilities.waitOnFutures(repairResponses, DatabaseDescriptor.getWriteRpcTimeout()); | ||
if (!DatabaseDescriptor.getDisableBlockOnReadRepair()) | ||
{ | ||
FBUtilities.waitOnFutures(repairResponses, DatabaseDescriptor.getWriteRpcTimeout()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's the addition in getRangeSlice
GetDisableReadRepairMutation.class, | ||
SetDisableReadRepairMutation.class, | ||
GetDisableBlockOnReadRepair.class, | ||
SetDisableBlockOnReadRepair.class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supporting in nodetool
if (command.isDigestQuery()) | ||
{ | ||
Keyspace.open(command.ksName).getColumnFamilyStore(command.cfName).metric.digestReads.mark(); | ||
} | ||
else | ||
{ | ||
Keyspace.open(command.ksName).getColumnFamilyStore(command.cfName).metric.dataReads.mark(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metrics on data vs. digest reads
@@ -1450,6 +1450,7 @@ private static List<Row> fetchRows(List<ReadCommand> initialCommands, Consistenc | |||
try | |||
{ | |||
Row row = exec.get(); | |||
Keyspace.open(exec.command.ksName).getColumnFamilyStore(exec.command.cfName).metric.avoidedReadRepairs.mark(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future ref, this metric is placed here because exec.get() will throw a DigestMismatchException
if values do not agree
blockingReadRepairs = Metrics.meter(factory.createMetricName("BlockingReadRepairs")); | ||
backgroundReadRepairs = Metrics.meter(factory.createMetricName("BackgroundReadRepairs")); | ||
attemptedReadRepairs = Metrics.meter(factory.createMetricName("AttemptedReadRepairs")); | ||
digestReads = Metrics.meter(factory.createMetricName("DigestReads")); | ||
dataReads = Metrics.meter(factory.createMetricName("DataReads")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on #634 and #630
This PR adds two new config options to
cassandra.yaml
, and their respectivenodetool
commands for checking/setting at runtime.disable_block_on_read_repair
- this coordinator will read-repair as normal, but will not block on ACKs forREAD_REPAIR
verbs before returning to the client. The intention is to reduce the latency of read-repairs, at the cost of read-repairing multiple times if the same cell is read repeatedly.disable_read_repair_mutation
- this coordinator will still resolve a full read in the case of digest mismatches, but will not sendREAD_REPAIR
mutation verbs to any replicas. The intention is to reduce the write load on replicas.