The Data Lakehouse Readiness Score is a quantitative measure that assesses a database or query engine or analytics engine support of Apache Iceberg, Apache Hudi and Delta Lake using a data catalog like Hive Metastore (HMS), Glue, Snowflake Catalog, or Unity Catalog.
Table Format | Catalog | Score | |
---|---|---|---|
Clickhouse | R* | - | 0 |
StarRocks | CelerData | R*W* | HMS, Glue | 3 |
Apache Druid | Imply | R* | - | 0 |
PrestoDB | R*W* | HMS, Glue | 3 |
TrinoDB | StarBurst | R*W* | HMS, Glue | 3 |
DuckDB | Motherduck | R* | - | 0 |
AWS Redshift | R | Glue | 1 |
GCP BigQuery | R* | - | 0 |
Snowflake | R*W* | Snowflake | 3 |
Polars | R*W* | HMS, Glue | 3 |
Daft | R*W* | HMS, Glue | 3 |
Apache Spark | RW | HMS, Glue | 5 |
AWS Athena | R*W* | Glue | 3 |
AWS Redshift Spectrum | R*W* | Glue | 3 |
Dremio | R*W* | HMS, Glue | 3 |
Databend | R | HMS | 1 |
SingleStore | R* | Glue, Snowflake | 0 |
Umbra DB | CedarDB | No Data | - | 0 |
Scoring:
- RW = 5 points, R*W* = 3 points, R = 1 point
Key:
- R* = Can read at least one of the open table formats
- R = Can read at least one of the open table formats using a data catalog
- R*W* = Can read and write at least one of the open table formats using a data catalog
- RW = Can read and write all 3 of major open table formats using a data datalog
- Apache Hudi vs Delta Lake vs Apache Iceberg - Data Lakehouse Feature Comparison
- 2024 Lakehouse Format Rundown: Engines & Gorillas
- Some good stuff in there but biases?. Read the comments to the article. Also he doesn't accept PRs (eg. see my PR that has been waiting for months) and some of the data is out of date.
- Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
- Why some OLAP databases are faster than others
- Why are there benchmarks for some databases but not for others? You can thank Oracle and where are we now?
- Open Source and Closed Source OLAP databases that can run TPC-H and TPC-DS benchmarks