Skip to content

Add support for registering files in the Arrow IPC stream format as tables using register_arrow or similar #16688

@corasaurus-hex

Description

@corasaurus-hex

Is your feature request related to a problem or challenge?

Datafusion currently supports registering files in the Arrow IPC file format as tables:

    ctx.register_arrow("my_table", "file.arrow", ArrowReadOptions::default())
        .await
        .unwrap();

    ctx.sql("SELECT * FROM my_table LIMIT 10")
        .await
        .unwrap()
        .show()
        .await
        .unwrap();

You can also just reference the file path from SQL in datafusion-cli:

> SELECT * FROM 'file.arrow' LIMIT 10;

You cannot, however, do the same with files in the Arrow IPC stream format. You get the error:

called `Result::unwrap()` on an `Err` value: ArrowError(ParseError("Arrow file does not contain correct footer"), None)

Describe the solution you'd like

I would love if register_arrow supported files in the Arrow IPC stream format, or if another equivalent function would be added to do the same. Additionally, it would be great if datafusion-cli could just reference the files by name in the same way it can for the alternative Arrow IPC format.

Describe alternatives you've considered

  1. Convert from the stream format to the file format and then query as shown above.
  2. Read all the record batches into memory and then register it as MemTable.
  3. Add a new StreamProvider impl and use a StreamTable.

There are probably others, too, but none as simple as just being able to register the arrow file with register_arrow or referencing the file directly in datafusion-cli.

Additional context

I'm interested in taking a crack at this feature but, assuming y'all are interested in it, I would love some implementation guidance.

Thanks for your time!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions