Skip to content

fix: reuse system metrics in nodes on mlflow.start_run #647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Calychas
Copy link
Contributor

@Calychas Calychas commented Apr 12, 2025

Description

With the current measures to make kedro-mlflow work with thread safety in MLFlow, technically a call to mlflow.start_run() is happening on every node with the same run_id. While having logging of system metrics enabled this means, a new SystemMetricMonitor will be started for every node with the same run_id (source code). The pipeline will run slower and slower, until it stops progressing (I couldn't run a pipeline with ~200 nodes, stopped ~50)

Development notes

One line change - set parameter log_system_metrics=False in mlflow.start_run() for before_node_run hook.

Added test that checks whether the SystemMetricMonitor is the same after running a node.

Checklist

  • Read the contributing guidelines
  • Open this PR as a 'Draft Pull Request' if it is work-in-progress
  • Update the documentation to reflect the code changes
  • Add a description of this change and add your name to the list of supporting contributions in the CHANGELOG.md file. Please respect Keep a Changelog guidelines.
  • Add tests to cover your changes

Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

@Galileo-Galilei
Copy link
Owner

Thanks for the bug report PR and sorry for the late reply! Is it good to go? Can i merge it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants