Conversation
355aa7d to
3f32a3e
Compare
|
@ividito will you be able to demo this for the team? |
sm2a/airflow_worker/Dockerfile
Outdated
| && pip install --no-cache-dir -r requirements.txt -c "https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-3.11.txt" | ||
| && pip install "apache-airflow[celery,amazon]==${AIRFLOW_VERSION}"\ | ||
| && pip install --no-cache-dir -r requirements.txt -c "https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-3.11.txt"\ | ||
| && pip install --no-cache-dir -r requirements-in.txt |
There was a problem hiding this comment.
Why are we installing libraries without constraints?
There was a problem hiding this comment.
I hit a barrier installing stactools which requires a more secure version of httpx. According to the Airflow docs, once the Airflow installation is complete, it's recommended that additional dependencies are added as a second step without constraints.
There was a problem hiding this comment.
What do you think about us creating our own constraints.txt? that way we will not have to track two requirements.txt
There was a problem hiding this comment.
I didn't want to spend time untangling it in this PR, but I think if we put the time in, we could end up with something like this:
pip install "airflow==${AIRFLOW_VERSION}" -r airflow-providers.txt -c airflow-constraints.txt
pip install dag-requirements.txt -c "airflow==${AIRFLOW_VERSION}"
The missing step is splitting up our current requirements file, where it's unclear which dependencies are copy-pasted constraints, and which are needed to maintain DAG functionality. This work can be incremental for a while - new dag dependencies should be added to requirements.in, and we can start to prune the old requirements file when we work on #318. Using uv pip tree while we do that can help us validate some of the links between dependencies in that file.
Summary:
Addresses #277
stactools-pipeline DAGs
noaa-hrrrpipeline to illustrate how more customized stactools packages can be implementedsentinel2,cop-dem, andlandsatdatasets with varied results.sentinel2doesn't implementcreate_collection(), and so it fails with anAttributeErrorlandsatrequires a metadata file href as a granule, and works without additional parameters (although I have questions about how it will handle s3-hosted data)cop_demworks with additional parameters (this test case is included as the default input for the DAG)noaa-hrrrpackage requires unique inputs, and would not work when used in the generic pipeline.How to test
workerrequirements. If creating a new custom DAG (likenoaa-hrrr, either add the package to theservicesrequirements, or avoid importing the package in the same file that defines a DAG.