You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed during our GSC Cloud Tech meeting on Wed 2022-04-20:
As our datasets and models evolve, slightly different charts may result over time. Such apparent inconsistencies may perplex our users.
Internally, we recently have a case where two supposedly identical scenario calculation runs, done 11 months apart, give two different max PGA values, and it wasn't immediately obvious whether it was due to different OpenQuake versions (v3.11.0 vs v3.11.5) or due to data/model refinement over time.
Suggested remedies include:
Maintain multiple major versions of API so that end users can compare the changes over time
Record all relevant build information into the database and exported to Elasticsearch and pygeoapi etc. to allow for reproducible builds
Proactively/Preemptively tell the end users of changes in Release Notes, especially explain any discrepancies in the results between versions, so that the end users know what's coming (thus not surprised/confused).
It is probably easiest to do, at least initially, from OpenDRR/opendrr-api add_data.sh because it is aggregating all the data source anyway. Some ideas of what to record (not sure if we can get all of these, haha!):
Git commit references (release tag, commit hash, git describe) of all the Git repositories (opendrr-api, model-factory, earthquake-scenarios, canada-srm2, etc.) that are used for a certain stack build.
Exact versions of Docker images (pygeoapi, and especially python-env where the underlying Debian OS and Python versions may change)
dpkg -l (installed Debian packages)
pip3 list
Versions of Docker and Docker Compose
Host OS and version? CPU (model and no. of cores), RAM, etc.
OpenQuake version (already listed in CSV file or in logs?)
As discussed during our GSC Cloud Tech meeting on Wed 2022-04-20:
Suggested remedies include:
It is probably easiest to do, at least initially, from OpenDRR/opendrr-api add_data.sh because it is aggregating all the data source anyway. Some ideas of what to record (not sure if we can get all of these, haha!):
git describe
) of all the Git repositories (opendrr-api, model-factory, earthquake-scenarios, canada-srm2, etc.) that are used for a certain stack build.dpkg -l
(installed Debian packages)pip3 list
See also https://reproducible-builds.org/ and related reproducibility projects for ideas and inspiration.
The text was updated successfully, but these errors were encountered: