You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, DuckDB supports reading from file paths, URLs directly, however, providing in-memory buffers/ async readers is not allowed. This limits the ability to build efficient streaming pipelines or integrate with custom or remote object stores without temporary files or full memory loads.
Feature Request
Enable DuckDB to accept BytesIO, Vec<u8>, or AsyncRead-like streams as streaming inputs for read_parquet, read_csv, etc., without requiring the entire buffer to be loaded into memory first.
This is especially useful in contexts like:
Building data ingestion pipelines from custom sources (e.g., opendal)
Processing large remote files via chunked downloading / range reads
Memory-constrained environments or real-time ETL systems
Allowing streaming ingestion directly from buffers would unlock a range of integration and performance improvements for DuckDB in both cloud-native and high-throughput use cases.
Would love to hear thoughts on the feasibility and roadmap for this!
Thanks for building such a powerful engine!