-
Notifications
You must be signed in to change notification settings - Fork 133
Add PyCapsule Type Support and Type Hint Enhancements for AggregateUDF in DataFusion Python Bindings #1277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implement fallback for PyCapsule-backed providers, ensuring type checkers are satisfied without protocol-aware stubs. Update typing imports and cast PyCapsule inputs in AggregateUDF.from_pycapsule for precise constructor typing.
…zation signatures
Introduce a _PyCapsule typing protocol to enable type checkers to recognize PyCapsule-based registrations. Restrict the AggregateUDF udaf overload to the PyCapsule protocol and update from_pycapsule to wrap raw capsule inputs using the internal binding directly.
Introduce a utility to validate PyCapsules and convert them into reusable DataFusion aggregate UDFs. Update PyAggregateUDF.from_pycapsule to handle raw PyCapsule inputs, leverage the new helper, and maintain existing provider fallback and error handling.
| r"\b(?:pub\s+)?(?:struct|enum)\s+" | ||
| r"(?P<name>[A-Za-z_][A-Za-z0-9_]*)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR but this came up as a Ruff error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work.
| class _PyCapsule: | ||
| """Lightweight typing proxy for CPython ``PyCapsule`` objects.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
Which issue does this PR close?
udaffunction #1237Rationale for this change
The current
AggregateUDF.udafandAggregateUDF.from_pycapsulemethods in the DataFusion Python API lack proper type hinting and handling for CPythonPyCapsuleobjects. This omission causes static type checking tools (e.g., mypy) to fail when users register UDAFs originating from external providers such asgeodatafusion, even though the runtime behavior functions correctly.This PR addresses the gap by explicitly supporting PyCapsule types both in type hints and runtime checks. By doing so, it improves type safety, developer experience, and code clarity while maintaining full backward compatibility.
example from #1237
Before
After
What changes are included in this PR?
TypeGuardfunction_is_pycapsule()for lightweight PyCapsule type validation._PyCapsuleproxy class for static typing compatibility in non-type-checking contexts.AggregateUDF.__init__andAggregateUDF.udaf()to includeAggregateUDFExportable | _PyCapsuleargument types.AggregateUDF.from_pycapsule()to support direct PyCapsule initialization.PyAggregateUDF::from_pycapsule()logic to delegate PyCapsule validation to a new helper functionaggregate_udf_from_capsule()for cleaner handling.Are these changes tested?
Yes:
Are there any user-facing changes?
Yes, minor improvements:
These changes are fully backward-compatible and non-breaking for existing user code.