Skip to content

Repeated errors with PartitionRange caused by query timeouts #304

Open
@lemalucau-r7

Description

@lemalucau-r7

After running a migration for a table from Cassandra 2.1 to 4, one or more partitions keep on returning an error during the validation stage because the query keeps timing out:

24/09/11 14:01:00 ERROR DiffJobSession: Error with PartitionRange -- ThreadID: 121 Processing min: 139515770437584770019983589047024966756 max: 141217182272189462337300462084183807813
com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT2M
        at com.datastax.oss.driver.api.core.DriverTimeoutException.copy(DriverTimeoutException.java:36)
        at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:151)
        at com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.maybeMoveToNextPage(MultiPageResultSet.java:101)
        at com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:93)
        at com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:81)
        at com.datastax.oss.driver.internal.core.util.CountingIterator.tryToComputeNext(CountingIterator.java:93)
        at com.datastax.oss.driver.internal.core.util.CountingIterator.hasNext(CountingIterator.java:88)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:132)
        at com.datastax.oss.driver.internal.core.cql.PagingIterableSpliterator.forEachRemaining(PagingIterableSpliterator.java:120)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
        at com.datastax.cdm.job.DiffJobSession.getDataAndDiff(DiffJobSession.java:153)
        at com.datastax.cdm.job.DiffJobSession.processSlice(DiffJobSession.java:124)
        at com.datastax.cdm.job.DiffJobSession.processSlice(DiffJobSession.java:54)
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$5(DiffData.scala:33)                                                                                                                                                                                          [0/693]
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$5$adapted(DiffData.scala:31)
        at com.datastax.spark.connector.cql.CassandraConnector.$anonfun$withSessionDo$1(CassandraConnector.scala:104)
        at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:121)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:103)
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$4(DiffData.scala:31)
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$4$adapted(DiffData.scala:30)
        at com.datastax.spark.connector.cql.CassandraConnector.$anonfun$withSessionDo$1(CassandraConnector.scala:104)
        at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:121)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:103)
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$3(DiffData.scala:30)
        at com.datastax.cdm.job.DiffData$.$anonfun$execute$3$adapted(DiffData.scala:29)
        at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
        at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
        at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
        at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1031)
        at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1031)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2433)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)

But I took a look at the final record count and there are no mismatches or missing records, so it looks like the validation/migration ran perfectly?

24/09/11 14:02:36 INFO JobCounter: ################################################################################################
24/09/11 14:02:36 INFO JobCounter: Final Read Record Count: 144824
24/09/11 14:02:36 INFO JobCounter: Final Mismatch Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Corrected Mismatch Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Missing Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Corrected Missing Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Valid Record Count: 144824
24/09/11 14:02:36 INFO JobCounter: Final Skipped Record Count: 0
24/09/11 14:02:36 INFO JobCounter: ################################################################################################

I've tried validating those particular partitions but I still get the same error. I tried processing smaller chunks of the partitions but I still get the same error.

How do you deal with this issue? Also how can I be sure that the final record count is accurate if there was is an error with one or more partitions?

Any help would be appreciated, thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions