Skip to content

Conversation

Yicong-Huang
Copy link
Contributor

@Yicong-Huang Yicong-Huang commented Oct 13, 2025

What changes were proposed in this pull request?

To align with scala side, this PR adds transform() API in Columns API, similar to Dataset.transform():

def transform(self, f: Callable[["Column"], "Column"]) -> "Column":

Why are the changes needed?

We want to give users a way to chain their methods, such as

 >>> from pyspark.sql.functions import trim, upper
 >>> df = spark.createDataFrame([("  hello  ",), ("  world  ",)], ["text"])
 >>> df.select(df.text.transform(trim).transform(upper)).show()

This pattern is also easier for AI agents to learn and write.

Does this PR introduce any user-facing change?

Yes. New API is introduced.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

Tests generated by Copilot.

@akashadsare
Copy link

I would like to work on this issue.Please assign this to me

@Yicong-Huang
Copy link
Contributor Author

I would like to work on this issue.Please assign this to me

Hi @akashadsare, This is a pull request rather than an issue. Do you want to work on the review instead?
However, I don't have the privilege to assign this to anyone.

@akashadsare
Copy link

ok can you tell me how to contribute to this project

@sarutak
Copy link
Member

sarutak commented Oct 13, 2025

@akashadsare
Please read this guide to contribute to Apache Spark.
Thanks.

@akashadsare
Copy link

Thanks

@Yicong-Huang Yicong-Huang changed the title [SPARK-53841][Python] Implement transform() in Column API [SPARK-53841][Python][Connect] Implement transform() in Column API Oct 13, 2025
@Yicong-Huang Yicong-Huang changed the title [SPARK-53841][Python][Connect] Implement transform() in Column API [SPARK-53841][Python][Connect] Implement transform() in Column API Oct 13, 2025
...

@dispatch_col_method
def transform(self, f: Callable[["Column"], "Column"]) -> "Column":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would have to be listed in python/docs/source/reference/pyspark.sql

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@zhengruifeng zhengruifeng changed the title [SPARK-53841][Python][Connect] Implement transform() in Column API [SPARK-53841][PYTHON][CONNECT] Implement transform() in Column API Oct 14, 2025
Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending adding it to the column.rst

@github-actions github-actions bot added the DOCS label Oct 14, 2025
@Yicong-Huang
Copy link
Contributor Author

@zhengruifeng @HyukjinKwon I've added it to column.rst.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants