You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When we use the spark connector/plugin to generate the lineage from spark applications, we noticed that generated lineage is having s3 path as edges instead of hive/delta table name in the lineage flow. please refer the attached screenshot for the same.
Expected behavior
Ideally lineage should be shown as spark job --> hive/delta table name as an edge but in this case it is showing s3 path of that hive/delta table in the lineage visualization which is not expected. fyi, we are using hive metastore(RDS mysql) and spark is using hive metastore as metastore catalog. so that metastore will have metadata about hive table/delta table etc.
Screenshots
Attached
Desktop (please complete the following information):
OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version Datahub 0.14.0
Additional context
Attached pyspark application log for your reference.
The text was updated successfully, but these errors were encountered:
Describe the bug
When we use the spark connector/plugin to generate the lineage from spark applications, we noticed that generated lineage is having s3 path as edges instead of hive/delta table name in the lineage flow. please refer the attached screenshot for the same.
To Reproduce
Steps to reproduce the behavior:
pyspark --packages io.acryl:acryl-spark-lineage:0.2.16 --conf "spark.datahub.rest.server=http://XXXX:8080" --conf "spark.datahub.rest.token=XXXXXX" --conf "spark.extraListeners=datahub.spark.DatahubSparkListener" --conf "spark.jars.packages=io.acryl:acryl-spark-lineage:0.2.16" --conf "spark.datahub.metadata.dataset.include_schema_metadata=true" --conf "spark.datahub.metadata.dataset.materialize=true" --conf "spark.datahub.disableSymlinkResolution=true" --conf "spark.datahub.metadata.dataset.hivePlatformAlias=hive" --name datahub_lineage_test_20250415
Expected behavior
Ideally lineage should be shown as spark job --> hive/delta table name as an edge but in this case it is showing s3 path of that hive/delta table in the lineage visualization which is not expected. fyi, we are using hive metastore(RDS mysql) and spark is using hive metastore as metastore catalog. so that metastore will have metadata about hive table/delta table etc.
Screenshots
Attached
Desktop (please complete the following information):
Additional context
Attached pyspark application log for your reference.
The text was updated successfully, but these errors were encountered: