File DataType Roadmap #5046
universalmind303
started this conversation in
Roadmaps
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The new daft.File datatype will provide first-class support for handling file data across local and remote storage, enabling seamless file operations in distributed environments.
Milestones
Architecture
daft.File serves as the abstract interface users interact with, implemented by concrete classes:
PathFile: handles files stored anywhere with a path (local or remote)MemoryFile: for in-memory byte arraysThis separation exists because Python file APIs behave differently for filesystem-backed files versus in-memory data (FileIO vs BytesIO). Each Python class implements specific dunder methods to maintain compatibility with standard Python file interfaces.
While the Python classes provide the interface, the actual implementation lives in Rust-based PyDaftFile, which maintains optimized backends for different storage types:
This architecture allows us to implement storage-specific optimizations (like network buffering for S3 or HTTP) while presenting a consistent interface.
Key Design Principles
Example Usage
Beta Was this translation helpful? Give feedback.
All reactions