Skip to content

Account for potential evolving/changing results with reproducible builds #191

Open
@anthonyfok

Description

@anthonyfok

As discussed during our GSC Cloud Tech meeting on Wed 2022-04-20:

  • As our datasets and models evolve, slightly different charts may result over time. Such apparent inconsistencies may perplex our users.
  • Internally, we recently have a case where two supposedly identical scenario calculation runs, done 11 months apart, give two different max PGA values, and it wasn't immediately obvious whether it was due to different OpenQuake versions (v3.11.0 vs v3.11.5) or due to data/model refinement over time.

Suggested remedies include:

  • Maintain multiple major versions of API so that end users can compare the changes over time
  • Record all relevant build information into the database and exported to Elasticsearch and pygeoapi etc. to allow for reproducible builds
  • Proactively/Preemptively tell the end users of changes in Release Notes, especially explain any discrepancies in the results between versions, so that the end users know what's coming (thus not surprised/confused).

It is probably easiest to do, at least initially, from OpenDRR/opendrr-api add_data.sh because it is aggregating all the data source anyway. Some ideas of what to record (not sure if we can get all of these, haha!):

  • Git commit references (release tag, commit hash, git describe) of all the Git repositories (opendrr-api, model-factory, earthquake-scenarios, canada-srm2, etc.) that are used for a certain stack build.
  • Exact versions of Docker images (pygeoapi, and especially python-env where the underlying Debian OS and Python versions may change)
  • dpkg -l (installed Debian packages)
  • pip3 list
  • Versions of Docker and Docker Compose
  • Host OS and version? CPU (model and no. of cores), RAM, etc.
  • OpenQuake version (already listed in CSV file or in logs?)
  • IP address of build machine (?)
  • Stack build date/time and duration
  • (Optionally): Build user and email

See also https://reproducible-builds.org/ and related reproducibility projects for ideas and inspiration.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions