-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Loading BLAST/IPR backend data (or our generated data) dependencies into parquet/duckdb formats for efficiency and faster read/write times.
Discussion w/ @d33bs here in molevol_scripts | from Spring 2023
In addition to Arrow, DuckDB may be worth considering for the lab's work (perhaps in addition to, but also potent on its own). They published some recent benchmarking which may demonstrate why: https://duckdblabs.github.io/db-benchmark. In my experience, Arrow and DuckDB work nicely with one another, and both are R-compatible. DuckDB allows for direct interchange over Arrow, so it means one may possibly gain advantages by leveraging both together (using zero-copy techniques). data.table performs well alongside the others in the new benchmark, and I don't know as much about it, but wonder if it too could play into things.
Interested parties: @falquaddoomi @the-mayer @epbrenner?