Skip to content

Mlflow auth looks for incorrect response #266

@b0ws3r

Description

@b0ws3r

What happened?

anemoi-training==0.9.0
anemoi-utils==0.4.43

health_check method does not properly check for a successful web response

ML flow code at exception

 if response.text == "OK":  ## <----- should be response.status_code == 200
        return

throws the following

2026-01-30 02:34:26 INFO ✅ Successfully logged in to MLflow. Happy logging!
Apptainer> anemoi-training mlflow sync --source {mlflow_logs} --destination https://mlflow.ecmwf.int/ --run-id {run_id} --experiment-name {my_experiment} --verbose
30-Jan-26 02:34:37 - INFO - Using default logging config without output log file
Traceback (most recent call last):
  File "/usr/local/bin/anemoi-training", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/anemoi/training/__main__.py", line 23, in main
    cli_main(__version__, __doc__, COMMANDS)
  File "/usr/local/lib/python3.12/dist-packages/anemoi/utils/cli.py", line 266, in cli_main
    cmd.run(args)
  File "/usr/local/lib/python3.12/dist-packages/anemoi/training/commands/mlflow.py", line 261, in run
    **health_check**(args.destination)
  File "/usr/local/lib/python3.12/dist-packages/anemoi/utils/mlflow/utils.py", line 44, in health_check
    raise ConnectionError(error_msg)
ConnectionError: Could not connect to MLflow server at https://mlflow.ecmwf.int/. The server may require authentication, did you forget to turn it on?

This occurs because the multiurl library's response.text field is the entire HTML document in the response body, which does not equal the literal string "OK".

Proposed Solution

To check for a successful response, we should just check response.status_code == 200.

What are the steps to reproduce the bug?

  • anemoi-training==0.9.0
  • successful authentication at mlflow.ecmwf.int via anemoi-training mlflow login --url https://mlflow.ecmwf.int/
  • anemoi-training mlflow sync --source {mlflow_logs} --destination https://mlflow.ecmwf.int/ --run-id {run_id} --experiment-name {my_experiment}

Version

anemoi-training==0.9.0
anemoi-utils==0.4.43

Platform (OS and architecture)

SUSE Linux Enterprise Server 15 SP6

Relevant log output

Traceback (most recent call last):
  File "/usr/local/bin/anemoi-training", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/anemoi/training/__main__.py", line 23, in main
    cli_main(__version__, __doc__, COMMANDS)
  File "/usr/local/lib/python3.12/dist-packages/anemoi/utils/cli.py", line 266, in cli_main
    cmd.run(args)
  File "/usr/local/lib/python3.12/dist-packages/anemoi/training/commands/mlflow.py", line 261, in run
    **health_check**(args.destination)
  File "/usr/local/lib/python3.12/dist-packages/anemoi/utils/mlflow/utils.py", line 44, in health_check
    raise ConnectionError(error_msg)
ConnectionError: Could not connect to MLflow server at https://mlflow.ecmwf.int/. The server may require authentication, did you forget to turn it on?

Accompanying data

No response

Organisation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    To be triaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions