Skip to content

feat: Add stactool DAGs#307

Merged
ividito merged 8 commits intodevfrom
demo/stactools
Apr 16, 2025
Merged

feat: Add stactool DAGs#307
ividito merged 8 commits intodevfrom
demo/stactools

Conversation

@ividito
Copy link
Contributor

@ividito ividito commented Mar 5, 2025

Summary:

Addresses #277

stactools-pipeline DAGs

  • Adds generic/dynamic pipeline, as well as a noaa-hrrr pipeline to illustrate how more customized stactools packages can be implemented
  • The generic pipeline has been tested on sentinel2, cop-dem, and landsat datasets with varied results.
  • sentinel2 doesn't implement create_collection(), and so it fails with an AttributeError
  • landsat requires a metadata file href as a granule, and works without additional parameters (although I have questions about how it will handle s3-hosted data)
  • cop_dem works with additional parameters (this test case is included as the default input for the DAG)
  • The noaa-hrrr package requires unique inputs, and would not work when used in the generic pipeline.

How to test

  • Both DAGs have been pre-loaded with working, valid inputs. Run sm2a locally, point it at a working ingest API (either local or otherwise - I used local to avoid setting up auth), and trigger the DAGs with the default config.
  • To try other stactools packages, add them to the worker requirements. If creating a new custom DAG (like noaa-hrrr, either add the package to the services requirements, or avoid importing the package in the same file that defines a DAG.

@ividito ividito force-pushed the demo/stactools branch 2 times, most recently from 355aa7d to 3f32a3e Compare March 5, 2025 21:42
@amarouane-ABDELHAK
Copy link
Contributor

@ividito will you be able to demo this for the team?

&& pip install --no-cache-dir -r requirements.txt -c "https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-3.11.txt"
&& pip install "apache-airflow[celery,amazon]==${AIRFLOW_VERSION}"\
&& pip install --no-cache-dir -r requirements.txt -c "https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-3.11.txt"\
&& pip install --no-cache-dir -r requirements-in.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we installing libraries without constraints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit a barrier installing stactools which requires a more secure version of httpx. According to the Airflow docs, once the Airflow installation is complete, it's recommended that additional dependencies are added as a second step without constraints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about us creating our own constraints.txt? that way we will not have to track two requirements.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to spend time untangling it in this PR, but I think if we put the time in, we could end up with something like this:

pip install "airflow==${AIRFLOW_VERSION}" -r airflow-providers.txt -c airflow-constraints.txt
pip install dag-requirements.txt -c "airflow==${AIRFLOW_VERSION}"

The missing step is splitting up our current requirements file, where it's unclear which dependencies are copy-pasted constraints, and which are needed to maintain DAG functionality. This work can be incremental for a while - new dag dependencies should be added to requirements.in, and we can start to prune the old requirements file when we work on #318. Using uv pip tree while we do that can help us validate some of the links between dependencies in that file.

@ividito ividito merged commit 4cf61de into dev Apr 16, 2025
3 checks passed
@ividito ividito deleted the demo/stactools branch April 16, 2025 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants