-
Notifications
You must be signed in to change notification settings - Fork 202
Description
I was running a model trained with mlflow 2.6.0 on mlserver 1.6.1. After upgrading mlserver to 1.7.0, I get compatibility errors as outlined below. I'm not sure where the root of issue is exactly, so this issue contains a lot of information from different angles. I apologize for the scattered thoughts and hope you can answer my questions regarding compatibility below. Thank you!
In mlserver v1.7.0, specifically this PR (#1951), mlserver has increased the version their proto gencode is built with from 4.25.1 to 5.27.2. This is a breaking change; the current version is not backwards compatible with any protobuf runtime <5.27.2. The protobuf cross-version runtime guarantee states that New Gencode + Old Runtime = Never Allowed.
This breaking change is causing issues with older protobuf runtimes as already reported: #2130
As outlined in that issue, I think the protobuf requirements should be changed/restricted from "*" to ">=5.27.2".
This also implies an increase of the minimum supported mlflow version. Mlflow 2.6.0 has a requirement of protobuf "<5", so it will pull the latest 4.x.x runtime. This runtime cannot work with the newer proto gencode of mlserver v1.7.0, where the proto version was 5.27.2. This issue persists up to mlflow v2.15.0, when they rebuild their own proto gencode and relax the constraint to "<6" (see PR mlflow/mlflow#12360). (As you can see in their PR, they don't have to increase the minimum proto version, because they build their protos in such a way, that it switches the version based on the currently running/imported protobuf version. mlserver does not do this resulting in a hard increase of the minimum protobuf version.)
Therefore, I think the mlflow dependency should also be changed/restricted from "*" to ">=2.15.0. Thanks to another compatibility bug found with mlflow (#2113), this has already been changed to ">=2.19.0", but not released.
I am running models trained with mlflow v2.6.0 and want to increase the mlserver version to 1.7. I expected this to work without a problem because the environments should be separated and dependencies should not infere. However, I get the protobuf error as described in #1951. It seems I would need to re-train the model using a newer mlflow version (specifically >=2.19.0) in order for it to run on mlserver 1.7.0.
This situation leads me to some questions:
- In general, is it supported to run models trained with older mlflow versions on newer mlserver versions (given no major version changes in either)?
- Is it correct that models trained with mlflow <2.19.0 need to be re-trained with >=2.19.0 to run with mlserver 1.7.0?
- Are there any compatibility guarantees within a major like 1.x.x (only minor bumps) regarding trained/existing mlflow models?
- Why/how do the dependencies of the outside mlserver environment interact with the isolated environment of the mlflow model? Is this expected? (see below)
Some additional info because I fear that I'm chasing the wrong thing:
I noticed in the stacktrace of the protobuf error that it starts inside of the isolated mlflow environment but ends in the outer mlserver environment. It looks like a runpy call is jumping the model (mlflow) environment border and ending up in the mlserver environment, and I'm quite confused by that. Is this something that is expected? If this "sandbox escape" was not there and the runpy call would only use the dependencies in the model (mlflow) environment, it would not be able to import the new mlserver proto gencode (the one built with proto 5.27.2) where it currently fails. I'm not sure what it would import, because mlserver is only specified as an extra dependency and has no upper limit in 2.6.0 (source), so mlserver wouldn't be there (?) unfortunately, I don't understand the interaction of mlflow and mlserver in this case well enough.
Here's that stacktrace (see env info below):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/mnt/environments/3014daa3513bab3ce03cda028baa54034e3dff5de7e96c2e46ba23ee3154b0ff/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/environments/3014daa3513bab3ce03cda028baa54034e3dff5de7e96c2e46ba23ee3154b0ff/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
prepare(preparation_data)
File "/mnt/environments/3014daa3513bab3ce03cda028baa54034e3dff5de7e96c2e46ba23ee3154b0ff/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/mnt/environments/3014daa3513bab3ce03cda028baa54034e3dff5de7e96c2e46ba23ee3154b0ff/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen runpy>", line 291, in run_path
File "<frozen runpy>", line 98, in _run_module_code
File "<frozen runpy>", line 88, in _run_code
File "/opt/app-root/.conda/envs/env/bin/mlserver", line 5, in <module>
from mlserver.cli import main
File "/opt/app-root/.conda/envs/env/lib/python3.11/site-packages/mlserver/__init__.py", line 2, in <module>
from .server import MLServer
File "/opt/app-root/.conda/envs/env/lib/python3.11/site-packages/mlserver/server.py", line 17, in <module>
from .grpc import GRPCServer
File "/opt/app-root/.conda/envs/env/lib/python3.11/site-packages/mlserver/grpc/__init__.py", line 1, in <module>
from .server import GRPCServer
File "/opt/app-root/.conda/envs/env/lib/python3.11/site-packages/mlserver/grpc/server.py", line 9, in <module>
from .servicers import InferenceServicer
File "/opt/app-root/.conda/envs/env/lib/python3.11/site-packages/mlserver/grpc/servicers.py", line 3, in <module>
from . import dataplane_pb2 as pb
File "/opt/app-root/.conda/envs/env/lib/python3.11/site-packages/mlserver/grpc/dataplane_pb2.py", line 9, in <module>
from google.protobuf import runtime_version as _runtime_version
ImportError: cannot import name 'runtime_version' from 'google.protobuf' (/mnt/environments/3014daa3513bab3ce03cda028baa54034e3dff5de7e96c2e46ba23ee3154b0ff/lib/python3.11/site-packages/google/protobuf/__init__.py)
Environment info:
mlserver env (/opt/app-root/.conda/envs/env/
)
Created manually with conda inside a container which then runs mlserver.
- python3.11
- mlflow==2.22.0
- mlserver==1.7.0 -> contains gencode that requires protobuf runtime >=5.27.2
model (mlflow) env (/mnt/environments/3014daa3513
)
Created/managed by mlserver or mlflow (?) according to the conda.yaml file in the mlflow artifact
- python3.11
- mlflow==2.6.0
- implicitly protobuf<5