Skip to content

Support streaming Bytes/ in-memory buffers as input for read* without full materialization #485

@chitralverma

Description

@chitralverma

Currently, DuckDB supports reading from file paths, URLs directly, however, providing in-memory buffers/ async readers is not allowed. This limits the ability to build efficient streaming pipelines or integrate with custom or remote object stores without temporary files or full memory loads.

Feature Request

Enable DuckDB to accept BytesIO, Vec<u8>, or AsyncRead-like streams as streaming inputs for read_parquet, read_csv, etc., without requiring the entire buffer to be loaded into memory first.

This is especially useful in contexts like:

  • Building data ingestion pipelines from custom sources (e.g., opendal)
  • Processing large remote files via chunked downloading / range reads
  • Memory-constrained environments or real-time ETL systems

Allowing streaming ingestion directly from buffers would unlock a range of integration and performance improvements for DuckDB in both cloud-native and high-throughput use cases.

Would love to hear thoughts on the feasibility and roadmap for this!
Thanks for building such a powerful engine!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions