Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-66: Make DAG callbacks bundle aware #45860

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Jan 21, 2025

This involves using relative paths in the callbacks, resolving the full path and using it to queue the callback in the file processor process.

Closes: #45496

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Jan 21, 2025
Comment on lines 400 to 401
# Since we are already going to use that filepath to run callback,
# there is no need to have same file path again in the queue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a reason now - we are running the callback on an old bundle version now, so we should leave the existing entry in the queue so the latest file is still parsed when it is its turn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a reason now - we are running the callback on an old bundle version now, so we should leave the existing entry in the queue so the latest file is still parsed when it is its turn.

I have added bundle_version to DagFileInfo, which means versions will be differentiated in the queue. We can keep it this way. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding bundle_version to DagFileInfo sounds good. Good way to differentiate them. But, still don't think we should remove the entries. Left another comment below about this too.

# callback_paths_to_del = [x for x in self._callback_to_execute if x not in new_file_paths]
# for path_to_del in callback_paths_to_del:
# del self._callback_to_execute[path_to_del]
callback_paths_to_del = [x for x in self._callback_to_execute if x not in new_file_paths]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to be a little smarter about this - the file could still exist in the old bundle version, but not the latest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this code was added cause of SLA callbacks: #30076. I also removed filtering of _file_path_queue in the method since there's a possibility we are just working in a different version of the bundle

airflow/dag_processing/processor.py Show resolved Hide resolved
@@ -2212,15 +2213,14 @@ def safe_dag_id(self):
return self.dag_id.replace(".", "__dot__")

@property
def relative_fileloc(self) -> pathlib.Path | None:
def relative_fileloc(self) -> pathlib.Path:
"""File location of the importable dag 'file' relative to the configured DAGs folder."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring needs updating. Longer term fileloc should just be relative from the start: #45623

@ephraimbuddy ephraimbuddy force-pushed the make-callbacks-bundle-aware branch from 3ab1310 to 1bd1551 Compare January 23, 2025 00:46
from airflow.dag_processing.bundles.manager import DagBundlesManager

DagBundlesManager().sync_bundles_to_db()
self._bundles_manager = DagBundlesManager().sync_bundles_to_db()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._bundles_manager = DagBundlesManager().sync_bundles_to_db()
self._bundles_manager.sync_bundles_to_db()

Doesn't the factory create it for us?

@@ -263,7 +269,9 @@ def deactivate_stale_dags(
# last_parsed_time is the processor_timeout. Longer than that indicates that the DAG is
# no longer present in the file. We have a stale_dag_threshold configured to prevent a
# significant delay in deactivation of stale dags when a large timeout is configured
dag_file_path = DagFileInfo(path=dag.fileloc, bundle_name=dag.bundle_name)
dag_file_path = DagFileInfo(
path=dag.fileloc, bundle_name=dag.bundle_name, bundle_version=dag.bundle_version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to have the version here - we want to see if the dag is still present in the "latest" (a.k.a none) version.

Comment on lines 415 to 421
if file_info in self._file_path_queue:
# Remove DagFileInfo matching DagFileInfo from self._file_path_queue
# Since we are already going to use that DagFileInfo to run callback,
# there is no need to have same DagFileInfo again in the queue
self._file_path_queue = deque(
file_path for file_path in self._file_path_queue if file_path != request.full_filepath
file_path for file_path in self._file_path_queue if file_path != file_info
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do this anymore. Leave the existing entry, if it's in there, in there. Just add the versioned one to run the callback.

@@ -477,7 +491,8 @@ def _refresh_dag_bundles(self):

new_file_paths = [f for f in self._file_paths if f.bundle_name != bundle.name]
new_file_paths.extend(
DagFileInfo(path=path, bundle_name=bundle.name) for path in bundle_file_paths
DagFileInfo(path=path, bundle_name=bundle.name, bundle_version=bundle_model.version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. We want _file_stat to be "versionless", which is easier if we just leave these all as None.

@ephraimbuddy ephraimbuddy force-pushed the make-callbacks-bundle-aware branch from 1bd1551 to 0520626 Compare January 25, 2025 00:07
@ephraimbuddy ephraimbuddy force-pushed the make-callbacks-bundle-aware branch from 0520626 to e6a6b78 Compare January 26, 2025 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AIP-66: Make callbacks bundle aware
2 participants