You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use docker whit the CLI: docker run -it -p 8123:8888 -e ACCEPT_EULA=yes mcr.microsoft.com/mmlspark/release jupyter notebook
and the info is:
Language version (e.g. python 3.12.2):
Spark Version (3.4.1):
Describe the problem
Exception in thread "refresh progress" java.lang.OutOfMemoryError: Java heap space
at scala.Option.map(Option.scala:230)
at org.apache.spark.status.AppStatusStore.activeStages(AppStatusStore.scala:169)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:64)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:52)
at java.base/java.util.TimerThread.mainLoop(Timer.java:556)
at java.base/java.util.TimerThread.run(Timer.java:506)
24/04/17 07:56:15 ERROR Executor: Exception in task 31.0 in stage 119.0 (TID 318)
org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (functions$$$Lambda$4353/0x00000008419fe040: (structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int, structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int) => array<struct<sample:structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int,coalition:struct<type:tinyint,size:int,indices:array,values:array>,weight:double>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
24/04/17 07:56:15 ERROR Executor: Exception in task 29.0 in stage 119.0 (TID 316)
org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (functions$$$Lambda$4353/0x00000008419fe040: (structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int, structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int) => array<struct<sample:structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int,coalition:struct<type:tinyint,size:int,indices:array,values:array>,weight:double>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.(rows.scala:199)
at org.apache.spark.ml.linalg.VectorUDT.serialize(VectorUDT.scala:42)
at org.apache.spark.ml.linalg.VectorUDT.serialize(VectorUDT.scala:28)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:146)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:261)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:241)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.$anonfun$toCatalystImpl$2(CatalystTypeConverters.scala:167)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter$$Lambda$4870/0x0000000841be2840.apply(Unknown Source)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.TraversableLike$$Lambda$153/0x000000084022e840.apply(Unknown Source)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:167)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:157)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:477)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$$Lambda$4109/0x000000084178d840.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
24/04/17 07:56:15 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 31.0 in stage 119.0 (TID 318),5,main]
org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (functions$$$Lambda$4353/0x00000008419fe040: (structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int, structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int) => array<struct<sample:structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int,coalition:struct<type:tinyint,size:int,indices:array,values:array>,weight:double>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
24/04/17 07:56:15 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 29.0 in stage 119.0 (TID 316),5,main]
org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (functions$$$Lambda$4353/0x00000008419fe040: (structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int, structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int) => array<struct<sample:structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int,coalition:struct<type:tinyint,size:int,indices:array,values:array>,weight:double>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.(rows.scala:199)
at org.apache.spark.ml.linalg.VectorUDT.serialize(VectorUDT.scala:42)
at org.apache.spark.ml.linalg.VectorUDT.serialize(VectorUDT.scala:28)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:146)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:261)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:241)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.$anonfun$toCatalystImpl$2(CatalystTypeConverters.scala:167)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter$$Lambda$4870/0x0000000841be2840.apply(Unknown Source)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.TraversableLike$$Lambda$153/0x000000084022e840.apply(Unknown Source)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:167)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:157)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:477)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$$Lambda$4109/0x000000084178d840.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
24/04/17 07:56:15 WARN TaskSetManager: Lost task 31.0 in stage 119.0 (TID 318) (be868daf197a executor driver): org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (functions$$$Lambda$4353/0x00000008419fe040: (structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int, structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int) => array<struct<sample:structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int,coalition:struct<type:tinyint,size:int,indices:array,values:array>,weight:double>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
24/04/17 07:56:15 ERROR TaskSetManager: Task 31 in stage 119.0 failed 1 times; aborting job
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_4920/2756075189.py", line 1, in
shaps.head()
File "/opt/spark/python/pyspark/sql/dataframe.py", line 2821, in head
rs = self.head(1)
^^^^^^^^^^^^
File "/opt/spark/python/pyspark/sql/dataframe.py", line 2823, in head
return self.take(n)
^^^^^^^^^^^^
File "/opt/spark/python/pyspark/sql/dataframe.py", line 1322, in take
return self.limit(num).collect()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/spark/python/pyspark/sql/dataframe.py", line 1216, in collect
sock_info = self._jdf.collectToPython()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/py4j/java_gateway.py", line 1322, in call
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "/opt/spark/python/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: <exception str() failed>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/py4j/clientserver.py", line 516, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/py4j/clientserver.py", line 539, in send_command
raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/py4j/clientserver.py", line 516, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/py4j/clientserver.py", line 539, in send_command
raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
File /opt/spark/python/pyspark/sql/dataframe.py:2821, in DataFrame.head(self, n)
2820 if n is None:
-> 2821 rs = self.head(1)
2822 return rs[0] if rs else None
File /opt/spark/python/pyspark/sql/dataframe.py:2823, in DataFrame.head(self, n)
2822 return rs[0] if rs else None
-> 2823 return self.take(n)
File /opt/spark/python/pyspark/sql/dataframe.py:1322, in DataFrame.take(self, num)
1294 """Returns the first num rows as a :class:list of :class:Row.
1295
1296 .. versionadded:: 1.3.0
(...)
1320 [Row(age=14, name='Tom'), Row(age=23, name='Alice')]
1321 """
-> 1322 return self.limit(num).collect()
File /opt/spark/python/pyspark/sql/dataframe.py:1216, in DataFrame.collect(self)
1215 with SCCallSiteSync(self._sc):
-> 1216 sock_info = self._jdf.collectToPython()
1217 return list(_load_from_socket(sock_info, BatchedSerializer(CPickleSerializer())))
File /usr/local/lib/python3.12/site-packages/py4j/java_gateway.py:1322, in JavaMember.call(self, *args)
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
File /opt/spark/python/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception..deco(*a, **kw)
168 try:
--> 169 return f(*a, **kw)
170 except Py4JJavaError as e:
File /usr/local/lib/python3.12/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
File /usr/local/lib/python3.12/site-packages/IPython/core/interactiveshell.py:2155, in InteractiveShell.showtraceback(self, exc_tuple, filename, tb_offset, exception_only, running_compiled_code)
2152 traceback.print_exc()
2153 return None
-> 2155 self._showtraceback(etype, value, stb)
2156 if self.call_pdb:
2157 # drop into debugger
2158 self.debugger(force=True)
File /usr/local/lib/python3.12/site-packages/ipykernel/zmqshell.py:559, in ZMQInteractiveShell._showtraceback(self, etype, evalue, stb)
553 sys.stdout.flush()
554 sys.stderr.flush()
556 exc_content = {
557 "traceback": stb,
558 "ename": str(etype.name),
--> 559 "evalue": str(evalue),
560 }
562 dh = self.displayhook
563 # Send exception info over pub socket for other clients than the caller
564 # to pick up
File /usr/local/lib/python3.12/site-packages/py4j/protocol.py:471, in Py4JJavaError.str(self)
469 def str(self):
470 gateway_client = self.java_exception._gateway_client
--> 471 answer = gateway_client.send_command(self.exception_cmd)
472 return_value = get_return_value(answer, gateway_client, None, None)
473 # Note: technically this should return a bytestring 'str' rather than
474 # unicodes in Python 2; however, it can return unicodes for now.
475 # See py4j/py4j#306 for more details.
File /usr/local/lib/python3.12/site-packages/py4j/java_gateway.py:1036, in GatewayClient.send_command(self, command, retry, binary)
1015 def send_command(self, command, retry=True, binary=False):
1016 """Sends a command to the JVM. This method is not intended to be
1017 called directly by Py4J users. It is usually called by
1018 :class:JavaMember instances.
(...)
1034 if binary is True.
1035 """
-> 1036 connection = self._get_connection()
1037 try:
1038 response = connection.send_command(command)
File /usr/local/lib/python3.12/site-packages/py4j/clientserver.py:284, in JavaClient._get_connection(self)
281 pass
283 if connection is None or connection.socket is None:
--> 284 connection = self._create_new_connection()
285 return connection
SynapseML version
1.0.4
System information
I use docker whit the CLI:
docker run -it -p 8123:8888 -e ACCEPT_EULA=yes mcr.microsoft.com/mmlspark/release jupyter notebook
and the info is:
Describe the problem
Exception in thread "refresh progress" java.lang.OutOfMemoryError: Java heap space
at scala.Option.map(Option.scala:230)
at org.apache.spark.status.AppStatusStore.activeStages(AppStatusStore.scala:169)
at org.apache.spark.ui.ConsoleProgressBar.org$apache$spark$ui$ConsoleProgressBar$$refresh(ConsoleProgressBar.scala:64)
at org.apache.spark.ui.ConsoleProgressBar$$anon$1.run(ConsoleProgressBar.scala:52)
at java.base/java.util.TimerThread.mainLoop(Timer.java:556)
at java.base/java.util.TimerThread.run(Timer.java:506)
24/04/17 07:56:15 ERROR Executor: Exception in task 31.0 in stage 119.0 (TID 318)
org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (functions$$$Lambda$4353/0x00000008419fe040: (structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int, structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int) => array<struct<sample:structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int,coalition:struct<type:tinyint,size:int,indices:array,values:array>,weight:double>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
24/04/17 07:56:15 ERROR Executor: Exception in task 29.0 in stage 119.0 (TID 316)
org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (functions$$$Lambda$4353/0x00000008419fe040: (structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int, structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int) => array<struct<sample:structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int,coalition:struct<type:tinyint,size:int,indices:array,values:array>,weight:double>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.(rows.scala:199)
at org.apache.spark.ml.linalg.VectorUDT.serialize(VectorUDT.scala:42)
at org.apache.spark.ml.linalg.VectorUDT.serialize(VectorUDT.scala:28)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:146)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:261)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:241)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.$anonfun$toCatalystImpl$2(CatalystTypeConverters.scala:167)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter$$Lambda$4870/0x0000000841be2840.apply(Unknown Source)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.TraversableLike$$Lambda$153/0x000000084022e840.apply(Unknown Source)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:167)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:157)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:477)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$$Lambda$4109/0x000000084178d840.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
24/04/17 07:56:15 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 31.0 in stage 119.0 (TID 318),5,main]
org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (functions$$$Lambda$4353/0x00000008419fe040: (structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int, structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int) => array<struct<sample:structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int,coalition:struct<type:tinyint,size:int,indices:array,values:array>,weight:double>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
24/04/17 07:56:15 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 29.0 in stage 119.0 (TID 316),5,main]
org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (functions$$$Lambda$4353/0x00000008419fe040: (structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int, structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int) => array<struct<sample:structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int,coalition:struct<type:tinyint,size:int,indices:array,values:array>,weight:double>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.(rows.scala:199)
at org.apache.spark.ml.linalg.VectorUDT.serialize(VectorUDT.scala:42)
at org.apache.spark.ml.linalg.VectorUDT.serialize(VectorUDT.scala:28)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:146)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:261)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:241)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.$anonfun$toCatalystImpl$2(CatalystTypeConverters.scala:167)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter$$Lambda$4870/0x0000000841be2840.apply(Unknown Source)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.TraversableLike$$Lambda$153/0x000000084022e840.apply(Unknown Source)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:167)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$ArrayConverter.toCatalystImpl(CatalystTypeConverters.scala:157)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:106)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:477)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$$Lambda$4109/0x000000084178d840.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
24/04/17 07:56:15 WARN TaskSetManager: Lost task 31.0 in stage 119.0 (TID 318) (be868daf197a executor driver): org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user defined function (functions$$$Lambda$4353/0x00000008419fe040: (structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int, structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int) => array<struct<sample:structworkclass:string,education:string,marital-status:string,occupation:string,relationship:string,race:string,sex:string,native-country:string,age:int,education-num:int,capital-gain:int,capital-loss:int,hours-per-week:int,coalition:struct<type:tinyint,size:int,indices:array,values:array>,weight:double>>).
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala:217)
at org.apache.spark.sql.errors.QueryExecutionErrors.failedExecuteUserDefinedFunctionError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.generate_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.$anonfun$prepareShuffleDependency$10(ShuffleExchangeExec.scala:369)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
24/04/17 07:56:15 ERROR TaskSetManager: Task 31 in stage 119.0 failed 1 times; aborting job
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_4920/2756075189.py", line 1, in
shaps.head()
File "/opt/spark/python/pyspark/sql/dataframe.py", line 2821, in head
rs = self.head(1)
^^^^^^^^^^^^
File "/opt/spark/python/pyspark/sql/dataframe.py", line 2823, in head
return self.take(n)
^^^^^^^^^^^^
File "/opt/spark/python/pyspark/sql/dataframe.py", line 1322, in take
return self.limit(num).collect()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/spark/python/pyspark/sql/dataframe.py", line 1216, in collect
sock_info = self._jdf.collectToPython()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/py4j/java_gateway.py", line 1322, in call
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "/opt/spark/python/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: <exception str() failed>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/py4j/clientserver.py", line 516, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/py4j/clientserver.py", line 539, in send_command
raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/py4j/clientserver.py", line 516, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/py4j/clientserver.py", line 539, in send_command
raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
Py4JJavaError Traceback (most recent call last)
[... skipping hidden 1 frame]
Cell In[15], line 1
----> 1 shaps.head()
File /opt/spark/python/pyspark/sql/dataframe.py:2821, in DataFrame.head(self, n)
2820 if n is None:
-> 2821 rs = self.head(1)
2822 return rs[0] if rs else None
File /opt/spark/python/pyspark/sql/dataframe.py:2823, in DataFrame.head(self, n)
2822 return rs[0] if rs else None
-> 2823 return self.take(n)
File /opt/spark/python/pyspark/sql/dataframe.py:1322, in DataFrame.take(self, num)
1294 """Returns the first
num
rows as a :class:list
of :class:Row
.1295
1296 .. versionadded:: 1.3.0
(...)
1320 [Row(age=14, name='Tom'), Row(age=23, name='Alice')]
1321 """
-> 1322 return self.limit(num).collect()
File /opt/spark/python/pyspark/sql/dataframe.py:1216, in DataFrame.collect(self)
1215 with SCCallSiteSync(self._sc):
-> 1216 sock_info = self._jdf.collectToPython()
1217 return list(_load_from_socket(sock_info, BatchedSerializer(CPickleSerializer())))
File /usr/local/lib/python3.12/site-packages/py4j/java_gateway.py:1322, in JavaMember.call(self, *args)
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
File /opt/spark/python/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception..deco(*a, **kw)
168 try:
--> 169 return f(*a, **kw)
170 except Py4JJavaError as e:
File /usr/local/lib/python3.12/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
<class 'str'>: (<class 'ConnectionRefusedError'>, ConnectionRefusedError(111, 'Connection refused'))
During handling of the above exception, another exception occurred:
ConnectionRefusedError Traceback (most recent call last)
[... skipping hidden 1 frame]
File /usr/local/lib/python3.12/site-packages/IPython/core/interactiveshell.py:2155, in InteractiveShell.showtraceback(self, exc_tuple, filename, tb_offset, exception_only, running_compiled_code)
2152 traceback.print_exc()
2153 return None
-> 2155 self._showtraceback(etype, value, stb)
2156 if self.call_pdb:
2157 # drop into debugger
2158 self.debugger(force=True)
File /usr/local/lib/python3.12/site-packages/ipykernel/zmqshell.py:559, in ZMQInteractiveShell._showtraceback(self, etype, evalue, stb)
553 sys.stdout.flush()
554 sys.stderr.flush()
556 exc_content = {
557 "traceback": stb,
558 "ename": str(etype.name),
--> 559 "evalue": str(evalue),
560 }
562 dh = self.displayhook
563 # Send exception info over pub socket for other clients than the caller
564 # to pick up
File /usr/local/lib/python3.12/site-packages/py4j/protocol.py:471, in Py4JJavaError.str(self)
469 def str(self):
470 gateway_client = self.java_exception._gateway_client
--> 471 answer = gateway_client.send_command(self.exception_cmd)
472 return_value = get_return_value(answer, gateway_client, None, None)
473 # Note: technically this should return a bytestring 'str' rather than
474 # unicodes in Python 2; however, it can return unicodes for now.
475 # See py4j/py4j#306 for more details.
File /usr/local/lib/python3.12/site-packages/py4j/java_gateway.py:1036, in GatewayClient.send_command(self, command, retry, binary)
1015 def send_command(self, command, retry=True, binary=False):
1016 """Sends a command to the JVM. This method is not intended to be
1017 called directly by Py4J users. It is usually called by
1018 :class:
JavaMember
instances.(...)
1034 if
binary
isTrue
.1035 """
-> 1036 connection = self._get_connection()
1037 try:
1038 response = connection.send_command(command)
File /usr/local/lib/python3.12/site-packages/py4j/clientserver.py:284, in JavaClient._get_connection(self)
281 pass
283 if connection is None or connection.socket is None:
--> 284 connection = self._create_new_connection()
285 return connection
File /usr/local/lib/python3.12/site-packages/py4j/clientserver.py:291, in JavaClient._create_new_connection(self)
287 def _create_new_connection(self):
288 connection = ClientServerConnection(
289 self.java_parameters, self.python_parameters,
290 self.gateway_property, self)
--> 291 connection.connect_to_java_server()
292 self.set_thread_connection(connection)
293 return connection
File /usr/local/lib/python3.12/site-packages/py4j/clientserver.py:438, in ClientServerConnection.connect_to_java_server(self)
435 if self.ssl_context:
436 self.socket = self.ssl_context.wrap_socket(
437 self.socket, server_hostname=self.java_address)
--> 438 self.socket.connect((self.java_address, self.java_port))
439 self.stream = self.socket.makefile("rb")
440 self.is_connected = True
ConnectionRefusedError: [Errno 111] Connection refused
Code to reproduce issue
as the notebook code:
and then
and when I want have a look at
shaps
:the ERROR
Java heap space
shows out.Other info / logs
No response
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrationsThe text was updated successfully, but these errors were encountered: