Skip to content

What are the ultimate aims of arrow-zarr? #36

@JackKelly

Description

@JackKelly

Hi! Please forgive my very naive question! But please may I ask: What are the ultimate aims of arrow-zarr?

I ask because I'm toying with the idea of helping to speed up Python's xarray package. Perhaps by writing some of xarray in Rust. The end-goal would be much faster execution on a single machine by optimising queries, using all CPU cores for processing, and overlapping IO with compute. (None of these ideas are original to me, of course! And some of these ideas are already implemented in packages like dask, cubed, zarrs, and IceChunk.)

It occured to me that, perhaps, it'd be possible to use datafusion to do this! Specifically, maybe datafusion could be taught how to process labelled multi-dimensional arrays (with the appropriate custom code, of course). Within datafusion, we'd pass around arrow::Tensor structs. The ultimate ambition might be to re-implement a sizable portion of xarray in datafusion, and then expose an xarray-like API to Python.

And then I remembered that @maximedion2 had mentioned arrow-zarr to me a year ago, back when I was playing with io_uring! (And, in all likelihood, it was @maximedion2's comment that had primed me to think about datafusion for multi-dimensional arrays!).

So, I was wondering: Have you already built what I've been dreaming about?! Is arrow-zarr already an attempt to build something like xarray in datafusion?! (Absolutely no worries either way! I don't want to add pressure!)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions