Open
Description
After running a migration for a table from Cassandra 2.1 to 4, one or more partitions keep on returning an error during the validation stage because the query keeps timing out:
24/09/11 14:01:00 ERROR DiffJobSession: Error with PartitionRange -- ThreadID: 121 Processing min: 139515770437584770019983589047024966756 max: 141217182272189462337300462084183807813
com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT2M
at com.datastax.oss.driver.api.core.DriverTimeoutException.copy(DriverTimeoutException.java:36)
at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:151)
at com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.maybeMoveToNextPage(MultiPageResultSet.java:101)
at com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:93)
at com.datastax.oss.driver.internal.core.cql.MultiPageResultSet$RowIterator.computeNext(MultiPageResultSet.java:81)
at com.datastax.oss.driver.internal.core.util.CountingIterator.tryToComputeNext(CountingIterator.java:93)
at com.datastax.oss.driver.internal.core.util.CountingIterator.hasNext(CountingIterator.java:88)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:132)
at com.datastax.oss.driver.internal.core.cql.PagingIterableSpliterator.forEachRemaining(PagingIterableSpliterator.java:120)
at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
at com.datastax.cdm.job.DiffJobSession.getDataAndDiff(DiffJobSession.java:153)
at com.datastax.cdm.job.DiffJobSession.processSlice(DiffJobSession.java:124)
at com.datastax.cdm.job.DiffJobSession.processSlice(DiffJobSession.java:54)
at com.datastax.cdm.job.DiffData$.$anonfun$execute$5(DiffData.scala:33) [0/693]
at com.datastax.cdm.job.DiffData$.$anonfun$execute$5$adapted(DiffData.scala:31)
at com.datastax.spark.connector.cql.CassandraConnector.$anonfun$withSessionDo$1(CassandraConnector.scala:104)
at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:121)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:103)
at com.datastax.cdm.job.DiffData$.$anonfun$execute$4(DiffData.scala:31)
at com.datastax.cdm.job.DiffData$.$anonfun$execute$4$adapted(DiffData.scala:30)
at com.datastax.spark.connector.cql.CassandraConnector.$anonfun$withSessionDo$1(CassandraConnector.scala:104)
at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:121)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:103)
at com.datastax.cdm.job.DiffData$.$anonfun$execute$3(DiffData.scala:30)
at com.datastax.cdm.job.DiffData$.$anonfun$execute$3$adapted(DiffData.scala:29)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1031)
at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1031)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2433)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
But I took a look at the final record count and there are no mismatches or missing records, so it looks like the validation/migration ran perfectly?
24/09/11 14:02:36 INFO JobCounter: ################################################################################################
24/09/11 14:02:36 INFO JobCounter: Final Read Record Count: 144824
24/09/11 14:02:36 INFO JobCounter: Final Mismatch Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Corrected Mismatch Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Missing Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Corrected Missing Record Count: 0
24/09/11 14:02:36 INFO JobCounter: Final Valid Record Count: 144824
24/09/11 14:02:36 INFO JobCounter: Final Skipped Record Count: 0
24/09/11 14:02:36 INFO JobCounter: ################################################################################################
I've tried validating those particular partitions but I still get the same error. I tried processing smaller chunks of the partitions but I still get the same error.
How do you deal with this issue? Also how can I be sure that the final record count is accurate if there was is an error with one or more partitions?
Any help would be appreciated, thank you!
Metadata
Metadata
Assignees
Labels
No labels