Description
We are having a lot of spurious failures in both test CI jobs and RTD builds, all related to netcdf/hdf I/O or http issues.
Locally, remote datasets are not much of an issue because they get downloaded and stored in the arviz_data
folder, so unless that folder is removed after the first time they behave like local datasets. In CI however they need to be downloaded every time CI runs and for every CI job. I suspect the multiple CI jobs trying to download the same dataset at the same time or very close to each other is triggering an issue or a figshare protection against excessive traffic.
Testing wise, I think ideally we should use mock data, with minimal data stored locally in the repo for cases where it is difficult to mock data properly. For documentation I think we can probably choose 2-3 remote datasets to use (plus the centered/non-centered ones) and if we want to use other do so in notebooks which are executed locally and only rendered when building the docs.