Skip to content

Trace dataset configs using Airflow Datasets/Assets#383

Merged
ividito merged 8 commits intodevfrom
feat/persist-dataset-configs
Aug 21, 2025
Merged

Trace dataset configs using Airflow Datasets/Assets#383
ividito merged 8 commits intodevfrom
feat/persist-dataset-configs

Conversation

@ividito
Copy link
Contributor

@ividito ividito commented Jun 23, 2025

Summary:

Addresses #370

Changes

  • This PR is based on Update Airflow to 2.10.5 #318 in order to leverage Dataset Aliases.
  • All ingests using the veda-datasets DAG will produce a Dataset event, which is correlated by collection ID.
  • These events also write a manifest to s3 under the event_bucket
  • In Airflow 3, the "Datasets" terminology changes to "Assets" - this is no less confusing for us, but explains the deprecation warnings introduced by this PR. We can revisit this if/when we upgrade to Airflow 3.

PR Checklist

  • Unit tests
  • Ad-hoc testing - Deploy changes and test manually (currently live on SIT)

@ividito ividito force-pushed the feat/persist-dataset-configs branch from c28a14f to c775898 Compare June 23, 2025 19:59
@ividito ividito force-pushed the feat/update-airflow-2-10 branch 2 times, most recently from b1bc512 to 037e416 Compare June 24, 2025 19:57
Base automatically changed from feat/update-airflow-2-10 to dev June 24, 2025 23:56
@ividito ividito force-pushed the feat/persist-dataset-configs branch from c775898 to 2c8e2f5 Compare June 25, 2025 15:07
@ividito ividito marked this pull request as ready for review June 25, 2025 19:05
Copy link
Member

@botanical botanical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a couple of small comments. Overall, I think using Airflow Datasets to trace our dataset configs is such a good idea 💡 👏 Thanks for this PR!

@ividito ividito force-pushed the feat/persist-dataset-configs branch from 2c8e2f5 to 4d80287 Compare August 21, 2025 20:13
@ividito ividito requested review from aliziel and botanical August 21, 2025 20:23
@ividito ividito merged commit c3831e5 into dev Aug 21, 2025
4 checks passed
@ividito ividito deleted the feat/persist-dataset-configs branch August 21, 2025 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants