-
Notifications
You must be signed in to change notification settings - Fork 257
Open
Labels
? - Needs TriageNeed team to review and classifyNeed team to review and classifybugSomething isn't workingSomething isn't working
Description
Describe the bug
When running nds queries eg. query67 (or just full power run) in local mode, if the offHeap limits are set sufficiently low, it causes queries to fail with (see expanded details):
details
[1499267][14:38:59:382345][error ] [A][Stream 0x1][Upstream 20971520B][FAILURE maximum pool size exceeded: Not enough room to grow, current/max/try size = 3.765625 GiB, 3.765625 GiB, 20.000000 MiB] [1499366][14:38:59:382374][error ] [A][Stream 0x1][Upstream 20971520B][FAILURE maximum pool size exceeded: Not enough room to grow, current/max/try size = 3.765625 GiB, 3.765625 GiB, 20.000000 MiB] 25/07/22 14:38:59 INFO HostAlloc: Spilled 31622 bytes from the host store 25/07/22 14:38:59 INFO HostAlloc: Spilled 34758 bytes from the host store 25/07/22 14:38:59 WARN HostAlloc: Host store exhausted, unable to allocate 20971520 bytes. Total host allocated is 3934693450 bytes. Attempt 1. Attempting a retry. 25/07/22 14:38:59 INFO HostAlloc: Spilled 35530 bytes from the host store 25/07/22 14:38:59 INFO HostAlloc: Spilled 34902 bytes from the host store 25/07/22 14:38:59 INFO HostAlloc: Spilled 35582 bytes from the host store 25/07/22 14:38:59 INFO HostAlloc: Spilled 36694 bytes from the host store 25/07/22 14:38:59 INFO Executor: Executor interrupted and killed task 15.0 in stage 449.0 (TID 13331), reason: Stage cancelled: Job aborted due to stage failure: Task 2 in stage 449.0 failed 1 times, most recent failure: Lost task 2.0 in stage 449.0 (TID 13318) (10.110.46.230 executor driver): com.nvidia.spark.rapids.jni.CpuSplitAndRetryOOM: CPU OutOfMemory: could not split inputs and retry at com.nvidia.spark.rapids.RmmRapidsRetryIterator$NoInputSpliterator.split(RmmRapidsRetryIterator.scala:404) at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:656) at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:578) at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.drainSingleWithVerification(RmmRapidsRetryIterator.scala:295) at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRetryNoSplit(RmmRapidsRetryIterator.scala:189) at com.nvidia.spark.rapids.KudoSerializedBatchIterator.allocateHostWithRetry(GpuColumnarBatchSerializer.scala:701) at com.nvidia.spark.rapids.KudoSerializedBatchIterator.$anonfun$readNextBatch$5(GpuColumnarBatchSerializer.scala:726) at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30) at com.nvidia.spark.rapids.KudoSerializedBatchIterator.$anonfun$readNextBatch$4(GpuColumnarBatchSerializer.scala:706) at com.nvidia.spark.rapids.GpuMetric.ns(GpuMetrics.scala:321) at com.nvidia.spark.rapids.KudoSerializedBatchIterator.readNextBatch(GpuColumnarBatchSerializer.scala:706) at com.nvidia.spark.rapids.KudoSerializedBatchIterator.next(GpuColumnarBatchSerializer.scala:772) at com.nvidia.spark.rapids.KudoSerializedBatchIterator.next(GpuColumnarBatchSerializer.scala:649) at scala.collection.Iterator$$anon$11.next(Iterator.scala:496) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at com.nvidia.spark.rapids.HostCoalesceIteratorBase.bufferNextBatch(GpuShuffleCoalesceExec.scala:331) at com.nvidia.spark.rapids.HostCoalesceIteratorBase.hasNext(GpuShuffleCoalesceExec.scala:354) at com.nvidia.spark.rapids.GpuShuffleCoalesceIterator.hasNext(GpuShuffleCoalesceExec.scala:421) at com.nvidia.spark.rapids.DynamicGpuPartialAggregateIterator.$anonfun$hasNext$4(GpuAggregateExec.scala:2085) at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) at scala.Option.getOrElse(Option.scala:189) at com.nvidia.spark.rapids.DynamicGpuPartialAggregateIterator.hasNext(GpuAggregateExec.scala:2085) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at com.nvidia.spark.rapids.GpuOutOfCoreSortIterator.hasNext(GpuSortExec.scala:323) at com.nvidia.spark.rapids.shims.GpuWindowGroupLimitingIterator.hasNext(GpuWindowGroupLimitExec.scala:104) at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:390) at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:413) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)
We would expect that since batch size is 1gb and off heap limit is 4gb, that some tasks would be able to spill and block while others can proceed, but instead the whole thing fails. We want to understand better why this is happening, if there is a bug or what is consuming the memory.
Steps/Code to reproduce bug
This can be produced on a local NDS run with spark.rapids.memory.host.offHeapLimit.size=4g
configured
Expected behavior
The query can succeed by spilling and retrying as needed.
Metadata
Metadata
Assignees
Labels
? - Needs TriageNeed team to review and classifyNeed team to review and classifybugSomething isn't workingSomething isn't working