Reformat weather datasets into zarr.
See the dataset integration guide to integrate a new dataset to be reformatted.
We use
uv
to manage dependencies and python environmentsruff
for linting and formattingmypy
for type checkingpytest
for testingpre-commit
to automatically lint and format as you git commit
- Install uv
- Run
uv run pre-commit install
to setup the git hooks - If you use VSCode, you may want to install the extensions (ruff, mypy) it will recommend when you open this folder
uv run main --help
uv run main <DATASET_ID> update-template
uv run main <DATASET_ID> backfill-local <INIT_TIME_END>
- Add dependency:
uv add <package> [--dev]
. Use--dev
to add a development only dependency. - Lint:
uv run ruff check
- Type check:
uv run mypy
- Format:
uv run ruff format
- Test:
uv run pytest
To reformat a large archive we parallelize work across multiple cloud servers.
We use
docker
to package the code and dependencieskubernetes
indexed jobs to run work in parallel
- Install
docker
andkubectl
. Make suredocker
can be found at /usr/bin/docker andkubectl
at /usr/bin/kubectl. - Setup a docker image repository and export the DOCKER_REPOSITORY environment variable in your local shell. eg.
export DOCKER_REPOSITORY=us-central1-docker.pkg.dev/<project-id>/reformatters/main
- Setup a kubernetes cluster and configure kubectl to point to your cluster. eg
gcloud container clusters get-credentials <cluster-name> --region <region> --project <project>
- Create a kubectl secret containing your Source Coop S3 credentials
kubectl create secret generic source-coop-key --from-literal='AWS_ACCESS_KEY_ID=xxx' --from-literal='AWS_SECRET_ACCESS_KEY=xxx'
and set these environment variables in your local shellexport AWS_ACCESS_KEY_ID=xxx; export AWS_SECRET_ACCESS_KEY=xxx
.
DYNAMICAL_ENV=prod uv run main <DATASET_ID> backfill-kubernetes <INIT_TIME_END> [--jobs-per-pod <int>] [--max-parallelism <int>]