-
Notifications
You must be signed in to change notification settings - Fork 257
Open
Labels
? - Needs TriageNeed team to review and classifyNeed team to review and classifybugSomething isn't workingSomething isn't working
Description
This is a built that used cuDF sha rapidsai/cudf@2ee89db
So it includes a recent fix @ttnghia helped with rapidsai/cudf#19414.
I wonder if this is another manifestation of this in other cuDF sorts.
Stacks tend to point around group by:
25/07/23 11:02:46 INFO RmmRapidsRetryIterator: got a throwable in RmmRapidsRetryIterator.next():
ai.rapids.cudf.CudaFatalException: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
at ai.rapids.cudf.Table.orderBy(Native Method)
at ai.rapids.cudf.Table.orderBy(Table.java:2376)
at com.nvidia.spark.rapids.GpuSorter.$anonfun$sort$2(SortUtils.scala:210)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
at com.nvidia.spark.rapids.GpuSorter.$anonfun$sort$1(SortUtils.scala:209)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
at com.nvidia.spark.rapids.GpuSorter.sort(SortUtils.scala:208)
at com.nvidia.spark.rapids.GpuSorter.$anonfun$appendProjectedAndSort$1(SortUtils.scala:357)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
at com.nvidia.spark.rapids.GpuSorter.appendProjectedAndSort(SortUtils.scala:356)
at com.nvidia.spark.rapids.GpuSpillableProjectedSortEachBatchIterator$.$anonfun$apply$5(GpuSortExec.scala:223)
at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
at com.nvidia.spark.rapids.GpuSpillableProjectedSortEachBatchIterator$.$anonfun$apply$4(GpuSortExec.scala:222)
at com.nvidia.spark.rapids.GpuMetric.ns(GpuMetrics.scala:316)
at com.nvidia.spark.rapids.GpuSpillableProjectedSortEachBatchIterator$.$anonfun$apply$3(GpuSortExec.scala:221)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$AutoCloseableAttemptSpliterator.next(RmmRapidsRetryIterator.scala:537)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:690)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:577)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:496)
at com.nvidia.spark.rapids.GpuOutOfCoreSortIterator.next(GpuSortExec.scala:618)
at com.nvidia.spark.rapids.GpuOutOfCoreSortIterator.next(GpuSortExec.scala:298)
Metadata
Metadata
Assignees
Labels
? - Needs TriageNeed team to review and classifyNeed team to review and classifybugSomething isn't workingSomething isn't working