Cannot use SparkSession in resource_manager, when using LocalDaskExecutor with "processes" scheduler #3374
Replies: 2 comments
-
Hi @Sinha-Ujjawal and sincere apologies for the delayed response! That error arises out of pickling pyspark's context object (you can find the error message within this source code: https://people.eecs.berkeley.edu/~jegonzal/pyspark/_modules/pyspark/context.html) which is only necessary when sharing the spark objects across processes, and is not necessary when sharing the objects amongst threads. I recommending using Also, completely unrelated: note that you can change the name of your tasks in one line as follows: with Flow("LocalDask+Spark") as flow:
spark_app_name = Parameter(name="spark_app_name", required=True)
spark_master = Parameter(name="spark_master", default="local[*]")
with SparkContext(spark_app_name, spark_master) as spark:
t1 = task_1(spark, "local_dask__spark.csv", task_args={"name": "t1"})
t2 = task_1(spark, "local_dask__spark2.csv", task_args={"name": "t2"}) |
Beta Was this translation helpful? Give feedback.
-
Hey thx for the reply. Before this I always had to change the task name by - task.name = "name". I didn't know about task_args. This is useful 😁 |
Beta Was this translation helpful? Give feedback.
-
Hi
Cannot use SparkSession in resource_manager, when using LocalDaskExecutor with "processes" scheduler
Error-
Unexpected error: Exception('It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.')
When I used
scheduler="threads"
, it worked! Can anyone help me on that?Ragards,
Ujjawal
Beta Was this translation helpful? Give feedback.
All reactions