Open
Description
Large datasets, mostly CSV files, are currently fetched directly from Git LFS which induce significant Git LFS bandwidth costs.
Fetching these datasets as pre-compressed release assets will reduce download time and eliminate most GitHub Git LFS bandwidth costs. Thanks to @jvanulde for the idea and @DamonU2 for the pioneering work.
This, I think, is easier to implement and maintain, thus more robust and less error-prone than my previous unimplemented "XZ-compressed copies of repos" idea:
Data source repos:
- OpenDRR/openquake-inputs
- OpenDRR/model-inputs
- OpenDRR/canada-srm2
- OpenDRR/earthquake-scenarios
Scripts that fetch from these repos include (but may not be limited to):
- python/add_data.sh (OpenDRR/opendrr-api)
- scripts/DSRA_outputs2postgres_lfs.py (OpenDRR/model-factory)
Cf. these commands found in add_data.sh, for example:
fetch_csv openquake-inputs ...
fetch_csv model-inputs ...
curl -L https://api.github.com/repos/OpenDRR/canada-srm2/contents/cDamage/output?ref=tieg_natmodel2021
curl -L https://api.github.com/repos/OpenDRR/earthquake-scenarios/contents/FINISHED
python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=$DSRA_REPOSITORY --columnsINI=DSRA_outputs2postgres.ini --eqScenario="$eqscenario"
XZ or Zstd compression? (compressed file sizes vs. decompression speed)