-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIP-72: Task SDK support for on_task_instance_* listeners, make OpenLineage compatible #45294
base: main
Are you sure you want to change the base?
Conversation
f07a78e
to
81b886b
Compare
@mobuchowski - can you please rebase that one -> we found and issue with @jscheffl with the new caching scheme - fixed in #45347 that would run "main" version of the tests. I am asking in all affected PRs to rebase. |
81b886b
to
15fd95a
Compare
Actually - I rebased it now. |
I will rebase very soon as I'm working on some of the test failures anyway 🙂 |
35a94e2
to
86297af
Compare
86297af
to
6d88389
Compare
2bfe26b
to
e9a36fd
Compare
@@ -181,6 +182,7 @@ class DagRun(BaseModel): | |||
data_interval_end: UtcDateTime | None | |||
start_date: UtcDateTime | |||
end_date: UtcDateTime | None | |||
clear_number: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's odd to clear_number
in the DagRun datamodel!
What do we need this for? Can we get it from somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a part of the current DagRun model :)
airflow/airflow/models/dagrun.py
Line 160 in 02d83b0
clear_number = Column(Integer, default=0, nullable=False, server_default="0") |
We need it to properly generate DR uuid, so that events from different physical executions of a dag run aren't mixed up:
data=f"{conf.namespace()}.{dag_id}.{clear_number}".encode(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have clarified better: I meant clear_number
on the DagRun
Runtime model feels odd i.e. it isn't required for it.
Since logical_date
can now be None
too based on the link below, I think, you might need to refactor the logic for generating DR uuid.
https://lists.apache.org/thread/cknldkl9pmmzr1q7ot67wborzznlwrtv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually (in OpenLineage) if you clear and rerun a DAG run, the two runs (before and after the clear) are treated as entirely different objects. I believe this is kind of what we want to do in Airflow in the long run (similar to how we added TaskInstanceHistory), but before that happens, OL needs clear_number to distinguish logically different runs that reuse the same DR row and have the exact same identity otherwise (run_id, and even the UUID pk).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I am wrong -- but the (dag_id
, clear_number
& logical_date
) won't be unique anymore in AF 3.0 -- since logical_date
would accept Null values, no? @uranusjr
airflow/providers/src/airflow/providers/openlineage/plugins/adapter.py
Lines 117 to 124 in 90eae56
@staticmethod | |
def build_dag_run_id(dag_id: str, logical_date: datetime, clear_number: int) -> str: | |
return str( | |
generate_static_uuid( | |
instant=logical_date, | |
data=f"{conf.namespace()}.{dag_id}.{clear_number}".encode(), | |
) | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this needs to change - but I would leave that for followup PR after the actual change in Airflow behavior will get merged #45732
afc2ca4
to
1e04b32
Compare
c2e2165
to
305fff0
Compare
f6625b4
to
3dc5ae4
Compare
… TaskSDK, make OpenLineage provider support Airflow 3's listener interface Signed-off-by: Maciej Obuchowski <[email protected]>
3dc5ae4
to
39172c3
Compare
With AIP-72, there is no access to the database session from the worker process, and the runtime objects have some differences to the db models. This PR contains three commits that deal with that situation:
on_task_instance_*
listeners interface to AIP-72: dropssession
argument and makestask_instance
argument an instance ofRuntimeTaskInstance
class, not database modelSome followup work:
Activity
to make logging better visible from UI, and distinct from task logsRuntimeTaskInstance
)closes #45423