Skip to content

Dag processor consumes DagPriorityParsingRequest when relative file l… #48659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

shubham-pyc
Copy link

Dag Processing consumes DagPriorityParsingRequest

Needed For: #48174

#47844

@shubham-pyc
Copy link
Author

Still work in progress test cases are yet to come

@shubham-pyc shubham-pyc marked this pull request as draft April 2, 2025 07:21
@rawwar rawwar self-requested a review April 2, 2025 07:30
@shubham-pyc shubham-pyc force-pushed the feature/dags_processor_consumes_reserialize_api branch 2 times, most recently from da33e5e to 2230fda Compare April 2, 2025 08:04
@@ -407,20 +407,35 @@ def _queue_requested_files_for_parsing(self) -> None:

@provide_session
def _get_priority_files(self, session: Session = NEW_SESSION) -> list[DagFileInfo]:
files: list[DagFileInfo] = []
files: set[DagFileInfo] = set()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this from a list to a set to ensure uniqueness. There may be cases where a DagPriorityParsingRequest is triggered for both a specific file and a bundle, potentially causing duplicates. Using a set guarantees that each file appears only once.

@shubham-pyc shubham-pyc marked this pull request as ready for review April 3, 2025 08:43
@shubham-pyc
Copy link
Author

@jedcunningham please take a look at this one.

Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shubham-pyc, it looks like there are conflicts in the migration files.
You can refer to the following guide for examples on how to resolve them easily during a rebase:

https://github.com/apache/airflow/blob/main/contributing-docs/14_metadata_database_updates.rst#how-to-resolve-conflicts-when-rebasing

@shubham-pyc shubham-pyc force-pushed the feature/dags_processor_consumes_reserialize_api branch from 63040a3 to a3d64dd Compare April 10, 2025 04:43
Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix CI and conflicts. Overall the direction looks good.

@shubham-pyc shubham-pyc force-pushed the feature/dags_processor_consumes_reserialize_api branch from cd4d42c to acb0fde Compare April 23, 2025 02:47
@shubham-pyc
Copy link
Author

@pierrejeambrun resolved conflicts

Copy link
Contributor

@bugraoz93 bugraoz93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks! There are a couple of tests failing in the CI.

@shubham-pyc shubham-pyc force-pushed the feature/dags_processor_consumes_reserialize_api branch from acb0fde to 2c23a78 Compare April 29, 2025 05:55
@shubham-pyc shubham-pyc force-pushed the feature/dags_processor_consumes_reserialize_api branch from 2c23a78 to b1e9a7f Compare April 29, 2025 06:42
@shubham-pyc
Copy link
Author

@pierrejeambrun fixed the CI

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall that looks like the good direction.

@jedcunningham I know you've been working on related issues, any comment?

@pierrejeambrun
Copy link
Member

up.

# under the License.

"""
make relative_fileloc nullable for reserialized all bundles.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
make relative_fileloc nullable for reserialized all bundles.
Make DagPriorityParsingRuquest.relative_fileloc nullable.

nit


def __init__(self, bundle_name: str, relative_fileloc: str) -> None:
def __init__(self, bundle_name: str, relative_fileloc: str = "") -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self, bundle_name: str, relative_fileloc: str = "") -> None:
def __init__(self, bundle_name: str, relative_fileloc: str | None = None) -> None:

If we are allowing nullable, lets actually use nulls then - otherwise we don't really need to alter the table in the first place.

bool: True if relative_fileloc is None, indicating the whole folder should be parsed,
False otherwise.
"""
return self.relative_fileloc == ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self.relative_fileloc == ""
return self.relative_fileloc is None

rel_path=Path(request.relative_fileloc), bundle_name=bundle.name, bundle_path=bundle.path
if request.parse_whole_folder():
# If relative_fileloc is null, get all files from DagModel
dag_files = session.scalars(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should refresh the bundle and relist the dags from disk instead of querying the db.

I think we can just refresh here, relist, and add, but it'd be worth giving it a bit more thought to make sure we aren't impacting other stuff by doing so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants