Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot access ceph nano storage from Spark #347

Open
tomkos opened this issue Oct 17, 2021 · 0 comments
Open

Cannot access ceph nano storage from Spark #347

tomkos opened this issue Oct 17, 2021 · 0 comments

Comments

@tomkos
Copy link

tomkos commented Oct 17, 2021

Description:

I'm trying to access ceph storage, located locally on OpenShift cluster, but I'm using:

spark.hadoop.fs.s3a.path.style.access=true

but when job is run I get:

org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:com.amazonaws.AmazonClientException: Unable to execute HTTP request: test.ceph-nano-0: Name or service not known);

test - is a bucket name, I try to access it with "LOCATION 's3a://test/import'"

Steps to reproduce:

  1. Create spark cluster with thrift server and CEPH as a storage.
    2.Try to execute SparkSQL query and create external table, stored with s3 ceph storage.

Spark cluster details:

spec:
customImage: 'quay.io/opendatahub/spark-cluster-image:2.4.3-h2.7'
env:
- name: SPARK_METRICS_ON
value: prometheus
master:
cpuLimit: '2'
cpuRequest: 200m
instances: '1'
memoryLimit: 2Gi
memoryRequest: 512Mi
worker:
cpuLimit: '2'
cpuRequest: 200m
instances: '1'
memoryLimit: 2Gi
memoryRequest: 512Mi

Thrift server details:

apiVersion: v1
kind: Secret
metadata:
name: thriftserver-server-conf
stringData:
thrift-server.conf: |-
spark.blockManager.port=42100
spark.cores.max=2
spark.driver.bindAddress=0.0.0.0
spark.driver.host=thriftserver.$(namespace).svc
spark.driver.memory=2G
spark.driver.port=42000
spark.executor.memory=2G
spark.hadoop.datanucleus.rdbms.datastoreAdapterClassName=org.datanucleus.store.rdbms.adapter.PostgreSQLAdapter
spark.hadoop.datanucleus.schema.autoCreateAll=true
spark.hadoop.fs.s3a.endpoint=$(s3_endpoint_url)
spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.EnvironmentVariableCredentialsProvider
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.path.style.access=true
spark.hadoop.javax.jdo.option.ConnectionDriverName=org.postgresql.Driver
spark.hadoop.javax.jdo.option.ConnectionPassword=$(database_password)
spark.hadoop.javax.jdo.option.ConnectionURL=jdbc:postgresql://thriftserver-db.$(namespace).svc:5432/$(database_name)
spark.hadoop.javax.jdo.option.ConnectionUserName=$(database_user)
spark.sql.adaptive.enabled=true
spark.sql.thriftServer.incrementalCollect=true
spark.sql.warehouse.dir=/spark-warehouse

Spark Operator 1.1.0 used with Open Data Hub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant