This repository demonstrates how to manage climate Atlas NetCDF data using:
- DataLad for version control and lightweight data management
- CF/IOOS compliance checks for metadata validation
- STAC catalogs for structured metadata and discovery
All workflows are reproducible locally and run automatically on GitHub Actions when metadata updates are detected.
The Zenodo badge above is for showcase purposes.
When you create a release and link the GitHub repository to Zenodo, a DOI will be automatically minted for each version.
Before running the workflow, ensure you have DataLad and git-annex installed.
👉 See the DataLad Handbook – Installation Guide for detailed instructions and platform-specific notes.
# Using Homebrew (recommended)
brew install datalad git-annex
# Verify installation
datalad --version
git annex versionAlternatively, use Conda:
conda install -c conda-forge datalad git-annex
sudo apt update
sudo apt install datalad git-annexsudo dnf install datalad git-annexOnce installed, clone the dataset and you’re ready to go.
datalad clone https://github.com/cehbrecht/atlas-demo.git
cd atlas-democonda env create -f environment.yml
conda activate atlas-demoOptional: install DataLad via Conda if not installed system-wide
conda install -c conda-forge datalad git-annex
The file atlas/atlas_urls.csv lists available datasets from an external data source.
Each row defines a remote URL and a local storage path inside atlas/data/.
Example snippet:
url,path
https://data.example.org/cmip6/cd_CMIP6_ssp126_yr_2015-2100_v02.nc,atlas/data/v02/CMIP6/ssp126/cd_CMIP6_ssp126_yr_2015-2100_v02.nc
https://data.example.org/cerra/cd_CERRA_yr_1985-2021_v02.nc,atlas/data/v02/CERRA/cd_CERRA_yr_1985-2021_v02.nc
To register these datasets in your local DataLad dataset (without downloading the actual files):
make addurlsThis creates lightweight references in atlas/data/ that can be retrieved later on demand:
datalad get atlas/data/<file>.ncTo add NetCDF files that are already available locally:
- Copy the files into the appropriate folder under
atlas/data/. - Extract metadata and validate the files by running the workflow:
make updateOr step-by-step:
make metadata # extract STAC metadata for all available NetCDF files
make checks # run CF/IOOS compliance checks
make catalogs # generate STAC catalog- Save the new files and generated metadata to DataLad:
datalad save -m "Add new NetCDF data and metadata"- Push your changes to GitHub:
git pushNote: Only STAC catalogs are rebuilt automatically on GitHub via Actions.
Metadata extraction and CF checks must be run locally before committing.
To remove generated metadata and catalogs:
make clean- Runs automatically on push or pull request affecting
atlas/metadata/** - Builds and commits updated STAC catalogs under
catalogs/stac/ - Skips execution if no metadata changes are detected
- Official Handbook – complete guide
- Quick Guide – get started quickly
- Cheat Sheet – handy commands reference
- Get file content:
datalad get <file_or_dir> - Unlock a file for editing:
datalad unlock <file> - Drop local content:
datalad drop <file_or_dir> - Add new files:
datalad add <file_or_dir> - Add files from URLs:
datalad addurls -d . --fast atlas/atlas_urls.csv '{url}' '{path}' - Check dataset status:
datalad status - Save changes:
datalad save -m "commit message"
Useful for working with large datasets without downloading all content.
atlas-demo/
├── atlas/ # NetCDF data + metadata
├── catalogs/ # STAC catalogs
├── scripts/ # workflow scripts
├── .github/workflows/ # GitHub Actions definitions
├── environment.yml # Conda environment
├── Makefile # local workflow automation
└── README.md
make help # show help
make update # run full local workflow
make metadata # extract STAC metadata
make checks # run CF compliance checks
make catalogs # generate STAC catalog
make clean # remove generated files
make lint # lint Python scripts with RuffEach STAC Item in the catalog now includes:
dataladasset – points to the local DataLad-managed filehttpasset – direct HTTP download link (for demo, using a fixed prefix URL)
You can browse the catalog directly in a STAC Browser:
To download a file via HTTP:
- Click on an Item in the STAC Browser.
- Select the
"http"asset. - Copy the URL or download directly in your browser or via
wget/curl.
# Example using curl
curl -O https://data.mips.climate.copernicus.eu/thredds/fileServer/esg_c3s-cica-atlas/v02/CMIP6/historical/cdbals_CMIP6_historical_yr_1850-2014_v02.nc
---
## License
This project is licensed under the terms of the [MIT License](LICENSE).