Skip to content

Error running Serverless Spark jobs with Dataproc Batches Codelab #224

@natewh

Description

@natewh

Codelab - Dataproc Serverless
3. Run Serverless Spark jobs with Dataproc Batches](https://codelabs.developers.google.com/dataproc-serverless#2)

When running getting error:

SPARK_EXTRA_CLASSPATH=
:: loading settings :: file = /etc/spark/conf/ivysettings.xml
Traceback (most recent call last):
File "/tmp/srvls-batch-1994fce7-ed4a-4ec9-bb67-a01b7c9bb72d/citibike.py", line 32, in
df = spark.read.format("bigquery").load(table)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 177, in load
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in call
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
File "/usr/lib/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o73.load.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be instantiated
at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:586)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:813)
at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:729)
at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1403)
at scala.collection.convert.JavaCollectionWrappers$JIteratorWrapper.next(JavaCollectionWrappers.scala:38)
at scala.collection.StrictOptimizedIterableOps.filterImpl(StrictOptimizedIterableOps.scala:226)
at scala.collection.StrictOptimizedIterableOps.filterImpl$(StrictOptimizedIterableOps.scala:222)
at scala.collection.convert.JavaCollectionWrappers$JIterableWrapper.filterImpl(JavaCollectionWrappers.scala:58)
at scala.collection.StrictOptimizedIterableOps.filter(StrictOptimizedIterableOps.scala:218)
at scala.collection.StrictOptimizedIterableOps.filter$(StrictOptimizedIterableOps.scala:218)
at scala.collection.convert.JavaCollectionWrappers$JIterableWrapper.filter(JavaCollectionWrappers.scala:58)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:725)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:185)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.NoClassDefFoundError: scala/Serializable
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1012)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:587)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
at com.google.cloud.spark.bigquery.BigQueryRelationProvider.(BigQueryRelationProvider.scala:49)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:789)
... 25 more
Caused by: java.lang.ClassNotFoundException: scala.Serializable
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:587)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
... 42 more

ERROR: (gcloud.dataproc.batches.submit.pyspark) Batch job is FAILED.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions