Airflow pipelines for Open Climate Fix's production systems
Note
This repo is a migration of the dags
folder from the
ocf-infrastructure
repository. Commit history and authorship has not been preserved,
so for previous iterations of the DAGs, see the original repository's history.
Many of OCF's production services run as batch pipelines managed by an Airflow deployment. This repo defines those airflow DAGs that configure, version control, and test these pipelines, and handles the deployment process.
- Add new API check for national/forecast
- Update UK PVnet app from
2.5.22
to2.6.0
- new model for new gsps areas.
- Add new API checks for UK GSP and National
- Update blend service from
1.1.3
to1.1.4
- improved logging - UK PVnet app updated to
2.5.18
->2.5.22
- Don't regrid ECMWF for DA model and get ready for new GSPs. - New NL Forecasts
- Metrics upgrade from
1.2.23
to1.3.0
, major speed upgrade for ME - Scale UK GSP and National API to 2 ec2 instances
- Add new NL consumer for Ned-NL forecast, and use version
1.1.12
- Add new NL nwp consumer for ECMWF
- Pull both on and new GSPs from PVLive
- PVnet app updated to
2.5.16
->2.5.18
, fixes git version - Upgrade blend service to
1.1.3
- fixes version issue, note small data migration is needed, where we need to set created_utc times for the ml models. Also API should be upgraded to1.5.93
- Update slack warning maessage for PVnet app
- Upgrade PVsite database clean up to
1.0.30
- Adding a new NL consumer
- Update pvnet slack error/warning message logic
- Update slack error messages/links for uk and india satellite consumers
- Cloudcasting inputs on the intraday forecaster in dev
- Update forecast_blend
1.0.8
->1.1.1
- Update metrics
1.2.22
->1.2.23
- Add DAG to calculate ME
- Update PVLive consumer to use on prem server - from
1.2.5
to1.2.6
. - Trigger blend service, even if PVnet fails
- Tidy PVnet App docs -
2.5.15
to2.5.16
- India forecast app to save probabilistic values -
1.1.34
to1.1.39
- Upgrade Cloudcasting app -
0.0.7
to0.0.8
Initial release
Releases to development are made automatically when a PR is merged to main
.
For production releases, we try to bundle a few changes together in a minor version release.
Once we are ready to release to production we follow the next steps
- Create a new branch called
X.Y-release
- Update the readme, with the changes made in this new release. This can be done by compare tags, for example.
- Create a PR from
X.Y-release
tomain
and get this approved. - When merging this PR, add
#minor
to the PRExtended description
underCommit message
. - Merge the PR to
main
and delete the branch, this will create the tagX.Y
. - Under Actions, go to
Deploy DAGs
, click onRun workflow
and select theX.Y
tag. This will then need to be approved.
Copy the airflow_dags
folder into your dags
location:
$ cp -r airflow_dags /path/to/airflow/dags
Or use the build webserver image in your containerized airflow deployment:
$ docker pull ghcr.io/openclimatefix/airflow-dags
See the docker-compose file in the ocf-infrastructure repository.
DAGs are defined in the dags
folder, split into modules according to domain.
Each domain corresponds to a seperate deployment of airflow, and as such,
a distinct set of DAGs, hence some similarity or even duplication is expected.
Functions, or custom operators, are found in the plugins
folder.
Try to avoid it! The DAG name is how airflow identifies the DAG in the database. If you change the name of a DAG, airflow will treat it as a new DAG. This means that the old DAG will still be in the database, but it will not be updated or run.
Because service running configuration isn't terraform configuration! Terraform is usually used for setting up infrastructure - platform level resources like databases, networks, and VMs, and, Airflow itself. The DAGs that airflow runs, and the versions of the services that those DAGs run, are implementation details, and so should be stored in the config-as-code repository for airflow.
Furthermore, as a mostly Python organisation, having a top-level python only repo for Airflow increases it's accessibility to the wider team.
This project uses MyPy for static type checking and Ruff for linting. Installing the development dependencies makes them available in your virtual environment.
Use them via:
$ python -m mypy .
$ python -m ruff check .
Be sure to do this periodically while developing to catch any errors early and prevent headaches with the CI pipeline. It may seem like a hassle at first, but it prevents accidental creation of a whole suite of bugs.
There are some additional dependencies to be installed for running the tests,
be sure to pass --extra=dev
to the pip install -e .
command when creating your virtualenv.
(Or use uv and let it do it for you!)
Run the unit tests with:
$ python -m unittest discover -s tests -p "test_*.py"
On the directory structure:
- The official PyPA discussion on "source" and "flat" layouts.
- PR's are welcome! See the Organisation Profile for details on contributing
- Find out about our other projects in the here
- Check out the OCF blog for updates
- Follow OCF on LinkedIn
devsjc π» π¬ π π€ π |
Peter Dudfield π» |
James Fulton π» |
Sukhil Patel π» |
Yuvraaj Narula π» |
Megawattz π» |
Erics π» |
Part of the Open Climate Fix community.