RFC: Extension API for Daft #5788
universalmind303
started this conversation in
General
Replies: 1 comment 1 reply
-
|
I have recently been refactoring the |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Extension API for Daft
We're exploring an extension API to allow third-party libraries to build on top of Daft. Draft PR: #5751
Motivation
External contributors want to build domain-specific functionality without modifying Daft core. An extension API allows the ecosystem to grow independently. Extensions can iterate at their own pace, maintain separate release cycles, and avoid coupling to Daft's core development timeline.
Design Approaches
Approach 1: Dynamic Registration (Polars/Pandas model)
Lightweight Python-level extensions via decorators:
Pros: Simple to implement, flexible, pure Python, easy for extension authors
Cons: No type hints, limited discoverability, doesn't integrate with query planner
Approach 2: Global Registry (Spark/DataFusion model)
Pros: Tighter integration, can participate in optimization passes, first-class citizens in query planning
Cons: Much more complex implementation, needs careful API boundaries, harder to extend
We do have a lot of this in place for sources and sinks already, but don't provide any registry mechanism
Key Extension Points
Potential registration targets:
merge_columns, compaction utilities, etc)hdfs://,gvfs://, etcread_*()implementationswrite_*()implementationsQuestions
The draft PR implements the simpler decorator approach. Looking for feedback on whether we need the heavier registry system, and which extension points matter most.
Beta Was this translation helpful? Give feedback.
All reactions