-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spark-dependencies OutOfMemoryError #106
Comments
I have the same question. Is there any progress so far? |
Based on the previous experience it is just a matter of giving enough memory to spark. @kidrocknroll were you able to solve the problem? How much memory did you allocate? |
A possible workaround is to run the job more frequently on smaller chunks. With a ~15Gi daily span file, running the job every 4h works using the below spec.
|
We encountered the same issue. Using the latest container image (not available in DockerHub, but only in ghcr.io) fixed the issue for us. Maybe, because of JRE 11 instead of JRE 8, which uses |
I've encountered the same problem. My technical environment is an Ubuntu Virtual Machine with 32g of ram and 250g of storage space. So I moved the direction of the temp files to part of the disk. My disk is divided into two parts. The second part is called data/ and it contains more than 80% of the 250 g of storage; so I have the /data/temp/ directory. So I assigned this directory to the variable local.dir". So we have: "spark.local.dir=/data/tmp". here's how I solved it: I run spark with this configuration : And In my jupyter notebook configuration, I run this configuration: Spark configurationspark_conf = pyspark.SparkConf() Et Dans ma configuration du notebook jupyter, je lance cette configuration: Configuration de Sparkpyspark --packages io.delta:delta-core_2.12:2.3.0 |
Problem
How much memory does a spark-dependencies job take while handling about 12Gb data index?
I am totally new to the spark project and I have tried serval times to run a spark-dependencies job to create the DAG.
It always came with the error below even though I have adjusted the memory limit to about 28Gi.
Sometimes even a
copyOfRange
error occurs.Environment
spark job configuration
ES data size
Is there a way to solve this problem not by adding the memory limit? or it is just a usage problem that I have
Any suggestions or tips would be greatly appreciated.
The text was updated successfully, but these errors were encountered: