Skip to content

Implementing TPC-H #271

Open
Open
@amueller

Description

@amueller

Has anyone thought about implementing TPC-H using the dataframe API?
I think this would be very useful to test the scope, and also to draw attention to the dataframe API.
It would mean that anyone implementing the dataframe API could immediately get an apples-to-apples benchmark of performance.

Whether TPC-H is a good benchmark for dataframes is maybe not entirely clear, but it's the best there is right now AFAIK.

If we can make it so that polars, modin and duckdb run their comparisons via the dataframe API, I think that would be pretty sweet.

You can see the polars implementation of TPC-H here:
https://github.com/pola-rs/tpch
results here:
https://www.pola.rs/benchmarks.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions