Lineage Displays S3 Path Instead of Hive/Delta Table Name with Spark Pipeline/Plugin #13239

anilreddygollapalli · 2025-04-16T18:36:32Z

Describe the bug
When we use the spark connector/plugin to generate the lineage from spark applications, we noticed that generated lineage is having s3 path as edges instead of hive/delta table name in the lineage flow. please refer the attached screenshot for the same.

To Reproduce
Steps to reproduce the behavior:

Launch pyspark with spark plugin jars, properties as below
pyspark --packages io.acryl:acryl-spark-lineage:0.2.16 --conf "spark.datahub.rest.server=http://XXXX:8080" --conf "spark.datahub.rest.token=XXXXXX" --conf "spark.extraListeners=datahub.spark.DatahubSparkListener" --conf "spark.jars.packages=io.acryl:acryl-spark-lineage:0.2.16" --conf "spark.datahub.metadata.dataset.include_schema_metadata=true" --conf "spark.datahub.metadata.dataset.materialize=true" --conf "spark.datahub.disableSymlinkResolution=true" --conf "spark.datahub.metadata.dataset.hivePlatformAlias=hive" --name datahub_lineage_test_20250415

Expected behavior
Ideally lineage should be shown as spark job --> hive/delta table name as an edge but in this case it is showing s3 path of that hive/delta table in the lineage visualization which is not expected. fyi, we are using hive metastore(RDS mysql) and spark is using hive metastore as metastore catalog. so that metastore will have metadata about hive table/delta table etc.

Screenshots
Attached

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version Datahub 0.14.0

Additional context
Attached pyspark application log for your reference.

anilreddygollapalli · 2025-04-16T18:37:39Z

eventLogs-application_1741977607953_0005.zip
Please find attached pyspark application log as requested

anilreddygollapalli added the bug Bug report label Apr 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lineage Displays S3 Path Instead of Hive/Delta Table Name with Spark Pipeline/Plugin #13239

Lineage Displays S3 Path Instead of Hive/Delta Table Name with Spark Pipeline/Plugin #13239

anilreddygollapalli commented Apr 16, 2025

anilreddygollapalli commented Apr 16, 2025

Lineage Displays S3 Path Instead of Hive/Delta Table Name with Spark Pipeline/Plugin #13239

Lineage Displays S3 Path Instead of Hive/Delta Table Name with Spark Pipeline/Plugin #13239

Comments

anilreddygollapalli commented Apr 16, 2025

anilreddygollapalli commented Apr 16, 2025