🆕 [FEAT] Backend and processed data in duckdb, parquet formats

Loading BLAST/IPR backend data (or our generated data) dependencies into parquet/duckdb formats for efficiency and faster read/write times. 

Discussion w/ @d33bs [here](https://github.com/JRaviLab/molevol_scripts/issues/33#issuecomment-2784479960
) in `molevol_scripts` | from Spring 2023

>In addition to Arrow, DuckDB may be worth considering for the lab's work (perhaps in addition to, but also potent on its own). They published some recent benchmarking which may demonstrate why: https://duckdblabs.github.io/db-benchmark. In my experience, Arrow and DuckDB work nicely with one another, and both are R-compatible. DuckDB allows for direct interchange over Arrow, so it means one may possibly gain advantages by leveraging both together (using zero-copy techniques). data.table performs well alongside the others in the new benchmark, and I don't know as much about it, but wonder if it too could play into things.

Interested parties: @falquaddoomi @the-mayer @epbrenner?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🆕 [FEAT] Backend and processed data in duckdb, parquet formats #133

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🆕 [FEAT] Backend and processed data in duckdb, parquet formats #133

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions