[SPARK-52640][SDP] Propagate Python Source Code Location #51344

AnishMahto · 2025-07-01T20:24:34Z

What changes were proposed in this pull request?

Propagate source code location details (line number and file path) E2E for declarative pipelines. That is, collect this information from the python REPL that registers SDP datasets/flows, propagate it through the appropriate spark connect handlers, and associate it to the appropriate datasets/flows in pipeline events/exceptions.

Why are the changes needed?

Better observability and debugging experience for users. Allows users to identify the exact lines that cause a particular exception.

Does this PR introduce any user-facing change?

Yes, we are populating source code information in the origin for pipeline events, which is user-facing. Currently SDP is not released in any spark version however.

How was this patch tested?

Added tests to org.apache.spark.sql.connect.pipelines.PythonPipelineSuite

Was this patch authored or co-authored using generative AI tooling?

No

AnishMahto · 2025-07-01T20:26:12Z

@sryza

sryza

i left a few comments, but this looks close to ready to merge to me.

CC @gengliangwang @hvanhovell in case either of you are also interested in taking a look.

python/pyspark/pipelines/spark_connect_graph_element_registry.py

python/pyspark/sql/connect/proto/pipelines_pb2.pyi

sryza · 2025-07-01T20:52:02Z

...pipelines/src/main/scala/org/apache/spark/sql/pipelines/graph/GraphRegistrationContext.scala

@@ -49,35 +49,54 @@ class GraphRegistrationContext(
    flows += flowDef.copy(sqlConf = defaultSqlConf ++ flowDef.sqlConf)
  }

+  /**


These changes look independent from what the rest of this PR is doing?

I just added the scaladoc for the sake of future readers, but the changes to toDataflowGraph are relevant.

toDataflowGraph is where we ensure all identifiers are fully qualified, and qualify if not. It's intentional that once we fully qualify (or verify that the identifier is already fully qualified), we also update the associated query origin with the fully qualified identifier.

gengliangwang · 2025-07-02T22:12:57Z

sql/connect/server/src/main/scala/org/apache/spark/sql/connect/pipelines/PipelinesHandler.scala

@@ -156,6 +156,10 @@ private[connect] object PipelinesHandler extends Logging {
              .filter(_.nonEmpty),
            properties = dataset.getTablePropertiesMap.asScala.toMap,
            baseOrigin = QueryOrigin(
+              filePath = Option.when(dataset.getSourceCodeLocation.hasFileName)(


nit: we can store filePath and line to variables or a method to avoid duplicated code

gengliangwang · 2025-07-02T22:13:34Z

LGTM too.

sryza

LGTM!

impl

57174fc

github-actions bot added SQL PYTHON CONNECT labels Jul 1, 2025

sryza reviewed Jul 1, 2025

View reviewed changes

anishm-db added 2 commits July 1, 2025 21:51

regen with python >= 3.10

18bccc3

run dev/reformat-python

a1ce559

AnishMahto requested a review from sryza July 1, 2025 21:53

fix tests for python 3.10+

8a9dce2

gengliangwang reviewed Jul 2, 2025

View reviewed changes

gengliangwang approved these changes Jul 2, 2025

View reviewed changes

sryza approved these changes Jul 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-52640][SDP] Propagate Python Source Code Location #51344

[SPARK-52640][SDP] Propagate Python Source Code Location #51344

Uh oh!

AnishMahto commented Jul 1, 2025

Uh oh!

AnishMahto commented Jul 1, 2025

Uh oh!

sryza left a comment

Uh oh!

Uh oh!

Uh oh!

sryza Jul 1, 2025

Uh oh!

AnishMahto Jul 1, 2025

Uh oh!

gengliangwang Jul 2, 2025

Uh oh!

gengliangwang commented Jul 2, 2025

Uh oh!

sryza left a comment

Uh oh!

Uh oh!

[SPARK-52640][SDP] Propagate Python Source Code Location #51344

Are you sure you want to change the base?

[SPARK-52640][SDP] Propagate Python Source Code Location #51344

Uh oh!

Conversation

AnishMahto commented Jul 1, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

AnishMahto commented Jul 1, 2025

Uh oh!

sryza left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sryza Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

AnishMahto Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

gengliangwang Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

gengliangwang commented Jul 2, 2025

Uh oh!

sryza left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!