Repeated errors with PartitionRange caused by query timeouts

After running a migration for a table from Cassandra 2.1 to 4, one or more partitions keep on returning an error during the validation stage because the query keeps timing out:

```
24/09/11 14:01:00 ERROR DiffJobSession: Error with PartitionRange -- ThreadID: 121 Processing min: 139515770437584770019983589047024966756 max: 141217182272189462337300462084183807813
com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT2M
        at com.datastax.oss.driver.api.core.DriverTimeoutException.copy(DriverTimeoutException.java:36)
        at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:151)
        at com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.maybeMoveToNextPage(MultiPageResultSet.java:101)
        at com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:93)
        at com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:81)
        at com.datastax.oss.driver.internal.core.util.CountingIterator.tryToComputeNext(CountingIterator.java:93)
        at com.datastax.oss.driver.internal.core.util.CountingIterator.hasNext(CountingIterator.java:88)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:132)
        at com.datastax.oss.driver.internal.core.cql.PagingIterableSpliterator.forEachRemaining(PagingIterableSpliterator.java:120)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
        at com.datastax.cdm.job.DiffJobSession.getDataAndDiff(DiffJobSession.java:153)
        at com.datastax.cdm.job.DiffJobSession.processSlice(DiffJobSession.java:124)
        at com.datastax.cdm.job.DiffJobSession.processSlice(DiffJobSession.java:54)
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$5(DiffData.scala:33)                                                                                                                                                                                          [0/693]
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$5$adapted(DiffData.scala:31)
        at com.datastax.spark.connector.cql.CassandraConnector.$anonfun$withSessionDo$1(CassandraConnector.scala:104)
        at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:121)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:103)
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$4(DiffData.scala:31)
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$4$adapted(DiffData.scala:30)
        at com.datastax.spark.connector.cql.CassandraConnector.$anonfun$withSessionDo$1(CassandraConnector.scala:104)
        at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:121)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:103)
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$3(DiffData.scala:30)
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$3$adapted(DiffData.scala:29)
        at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
        at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
        at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
        at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1031)
        at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1031)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2433)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
```

But I took a look at the final record count and there are no mismatches or missing records, so it looks like the validation/migration ran perfectly?
```
24/09/11 14:02:36 INFO JobCounter: ################################################################################################
24/09/11 14:02:36 INFO JobCounter: Final Read Record Count: 144824
24/09/11 14:02:36 INFO JobCounter: Final Mismatch Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Corrected Mismatch Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Missing Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Corrected Missing Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Valid Record Count: 144824
24/09/11 14:02:36 INFO JobCounter: Final Skipped Record Count: 0
24/09/11 14:02:36 INFO JobCounter: ################################################################################################
```

I've tried validating those [particular partitions](https://docs.datastax.com/en/data-migration/cassandra-data-migrator.html#cdm-partition-ranges) but I still get the same error. I tried processing smaller chunks of the partitions but I still get the same error. 

How do you deal with this issue? Also how can I be sure that the final record count is accurate if there was is an error with one or more partitions? 

Any help would be appreciated, thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repeated errors with PartitionRange caused by query timeouts #304

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Repeated errors with PartitionRange caused by query timeouts #304

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions