Skip to content

[Meta-issue] Optimize pipeline (python/add_data.sh etc.) #76

@anthonyfok

Description

@anthonyfok

Goals include:

  • Reduce download time, build time, disk usage...
  • Increase robustness / resilience (e.g. recovering from interrupted download)
  • ... (to be continued)

Future tasks (that have yet to be turned into GitHub issues):

  • Use of e.g. /usr/bin/time -v for profiling
  • docker-compose logs -f -t provides log with timestamp
  • Some kind of DEBUG variable? e.g. Make the psql flag -a or --echo-all optional unless in DEBUG mode for a more concise log.
  • Add option to delete downloaded *.gpkg and *.csv files as soon as they have been imported to save space
  • etc.

Maybe in Round 2 of refactoring? Or this round? Need to discuss with Drew first:

  • Leave the model-factory/scripts/* files where they are instead of copying them?
  • Use e.g. _build and _data directories to separate our code from downloaded data and temporary build files?

Random ideas, questions, etc.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions