Description
To make further progress on automated testing we need to tackle the problem of input and output data. The proposed solution is to replace data with code for max flexibility:
-
Add
improver gen-data template-cube [options] --output FILE
command. Options should allow to generate various synthetic fields,choose the type of data (e.g. land-sea mask, orography, generic), coord system and size/resolution, probability representation(taken from thetemplate-cube
, see below) and any other features that we need may need to test our code. We don’t need to start full featured, we can keep adding and changing stuff there as needed. Generation of metadata (attributes & coords) could be split off into a separategen-meta-cube
command, which would create atemplate-cube
forgen-data
to fill in. -
Use metadata dump + some sort of data fingerprint/hash or stats as KGOs in a diff-friendly, text format and stick that in a public repo. One option is to use cube.xml(checksum=True) the way Iris does, but the downside of checksum is that it’s sensitive to small errors and environment setup, so running locally would require locally generated KGOs. Another option could be to convert data to images and use perceptual hashing (having an image to eyeball could be useful in its own right). Whatever we choose, we could easily change down the road as long as it stays as nice and small text.
By solving the above two problems our testing options will open up significantly. It should be straightforward then to set it up on CI for acceptance tests, we could use it for performance testing on nightly master with full size inputs, eventually use it to feed the suite reference run also on CI.