A repository to generate and distribute datasets used in unit tests of SwarmPAL. This repository uses git-lfs to efficiently store NetCDF files generated by SwarmPAL, whilst SwarmPAL uses Pooch to cache files locally for testing.
Each NetCDF file is generated from a corresponding YAML configuration file with the SwarmPAL CLI.
Install the latest version of SwarmPAL in a virtual environment. The most portable way to do this is by running the following commands in a terminal:
python -m venv
. venv/bin/activate
pip install swarmpalThe swarmpal CLI should now be available in the terminal.
The download.sh script can be used to generate the NetCDF files and update the registry.txt files with filenames and hashes.
Special care has to be taken when adding new datasets and unit tests in SwarmPAL to ensure that unit tests will use their corresponding datasets.
The following has to be done when a new dataset is needed for unit tests in SwarmPAL:
- Create a new configuration file for the dataset in
config/ - Generate the NetCDF file with the
download.shscript by runningKeep note of the new hash value of./download.sh config/<dataset>.yaml
registry.txtprinted when the script finishes. - Add the new
.yaml,.nc4and updatedregistry.txtfiles to this git repository'smainbranch and push the changes to GitHub. - While developing unit tests, it is useful to have Pooch download datasets from the
mainbranch of this repository. To do this, in the SwarmPAL repository, change theSWARMPAL_TEST_DATA_VERSIONto PEP440 local version by adding a+devto the end of the string. For example, if the current value ofSWARMPAL_TEST_DATA_VERSIONisvA.B.Cchange it tovA.B.C+dev. - Update the hash of
registry.txtin SwarmPAL repository. - Add unit tests to SwarmPAL using the new dataset.
- If you are happy with the new tests, create a tag in this repository and push it to GitHub:
git tag <new_tag> git push origin <new_tag>
- In the SwarmPAL repository, update
SWARMPAL_TEST_DATA_VERSIONto the same value of the new tag. Commit the new unit tests and the changes toSWARMPAL_TEST_DATA_VERSIONtogether. This will ensure Pooch will always fetch the dataset that corresponding to the new unit tests even if tests or datasets update in the future.