Skip to content

🆕 [FEAT] Backend and processed data in duckdb, parquet formats #133

@jananiravi

Description

@jananiravi

Loading BLAST/IPR backend data (or our generated data) dependencies into parquet/duckdb formats for efficiency and faster read/write times.

Discussion w/ @d33bs here in molevol_scripts | from Spring 2023

In addition to Arrow, DuckDB may be worth considering for the lab's work (perhaps in addition to, but also potent on its own). They published some recent benchmarking which may demonstrate why: https://duckdblabs.github.io/db-benchmark. In my experience, Arrow and DuckDB work nicely with one another, and both are R-compatible. DuckDB allows for direct interchange over Arrow, so it means one may possibly gain advantages by leveraging both together (using zero-copy techniques). data.table performs well alongside the others in the new benchmark, and I don't know as much about it, but wonder if it too could play into things.

Interested parties: @falquaddoomi @the-mayer @epbrenner?

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions