Readyness endpoint behavior confusing for KServe

When no models are loaded, the /ready endpoint will answer **True** because MLServer use the following logic in its dataplane : 

[return all([model.ready for model in models])](https://github.com/SeldonIO/MLServer/blob/2e9c43e4c89b5a78c0af3bc0ff852a581c549935/mlserver/handlers/dataplane.py#L63)

all("empty list") ==> True


The thing is that whenever you have a faulty model, let's say a mlflow runtime one that fail while loading, MLserver will gracefully handle the failure and unload the model. At this point you'll have short period of time when MLServer will continue running and have an empty list of model ( assuming there is only one model here), after which it will shutdown

And this small period of time might be confusing because, in a situation where KServe readiness probe check /v2/health/ready endpoint, it will consider the inference service as READY.
This behaviour can be devastating as KServe will then consider the new inference service as ready and will promote this revision as the new ready and healthy one. We'll then have crashloopbackoff inferenceservice as the elected one and the previous healthy revision will be terminated.

We faced this behavior in production. I know it might not be an issue, but more a question of "did we correctly interface Kserve and MLServer readiness". But is it worth considering that we should not answering **True** when no models are being loaded ? That would solve the problem. 



To reproduce the behavior, write the following piece of code to check the readyness of your mlserver instance : 

```
import requests
import time

url = "http://127.0.0.1:8080/v2/health/ready"

while True:
    ready = False
    start_time = time.time()

    # Check the URL 10 times per second for one second
    while time.time() - start_time < 1:
        try:
            response = requests.get(url, timeout=0.5)
            if response.status_code == 200:
                ready = True
                #break
        except requests.RequestException:
            # Ignore exceptions (e.g., connection errors) and continue checking
            pass
        time.sleep(0.1)

    if ready:
        print("READY")
    else:
        print("NOT READY")
```

And then try to load a faulty model as the following dummy one : 

```
class DummyModel(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        import NONEXISTINGMODULE
  ...
```
 
You'll notice a small period of time when your probes tells you that it's ready

I can do the PR if you agree that **we should not answer True if len(models) == 0** ( or other clever way to handle this situation if that makes sense)
Let me know


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Readyness endpoint behavior confusing for KServe #2056

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Readyness endpoint behavior confusing for KServe #2056

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions