[RFC] Check services before deleting MLModel

## Background 

Currently, models in ML-commons are widely used, particularly by pipelines and the agent framework. However, we lack a mechanism to detect if a model is being utilized by other services. To address this, we propose a new feature that will identify whether a model is in use by another service. If so, an error should be raised—similar to when attempting to delete a connector that is still in use (like [here](https://github.com/opensearch-project/ml-commons/blob/cc99c9dc3be8bfe2342d51cf21d29232cdc44a7d/plugin/src/main/java/org/opensearch/ml/action/connector/DeleteConnectorTransportAction.java#L83C29-L83C31)).

## All related service
We list all service using MLModel.

- Pipeline 
   - rag-pipeline
   - neural-sparse search pipeline
   - neural-sparse ingest pipeline
   - neural-sparse two phase pipeline
   - ml inference pipeline
   - text-embedding
- Tools
   - MLModel Tool
   - PPLTool
   - RAGTool
   - .....

## Solution
We can apply the same solution used for [connectors and models](https://github.com/opensearch-project/ml-commons/blob/cc99c9dc3be8bfe2342d51cf21d29232cdc44a7d/plugin/src/main/java/org/opensearch/ml/action/connector/DeleteConnectorTransportAction.java#L76). Generally, when attempting to delete a specific model ID, we can search both services (pipelines and agents) to check if they are using the current model ID. We have separate solution for each service since they are different

### Pipeline:
For all the pipelines we developed, we found that they all use the key **model_id** to document their relevant model id. Therefore, we can use **GetPipelineAction** or **GetSearchPipelineAction** to retrieve all pipelines and then check the key “model_id” to detect whether they reference the current model ID.

### Agent and Tools
Tools are more complex than pipelines because they use different keys to store the model IDs they reference. Furthermore, some tools use more than one model ID. For example, **RAGTool** requires both **embedding_model_id** and **inference_model_id**, while **PPLTool** uses **model_id** to document its model. The target field we need to check varies by tool, making it challenging to detect relevant models with a single template. Therefore, we propose a few options to address this complexity.


#### Option 1 (prefer)
We add an abstract function **GETAllModelKeys** in Base Class Factory<Tool>. 
```
interface Factory<T extends Tool> {
     List<String> GETAllModelKeys;
}
```
Each Tool’s factory needs to implement function and return a list of string which contains all fields can use model. For example, PPLTool should return [“model_id”] while RAGTool will return [“embedding_model_id”, “inference_model_id”].
When registering an agent, we call these functions to retrieve the relevant field names, parse the input parameters, and then write them to a fixed key, **relatedModelIds**. For example, A flow agent containing PPLTool's registration body is:
```
{
  "name": "Test_Agent_For_RAG_2",
  "type": "flow",
  "tools": [
      {
      "type": "PPLTool",
      "parameters": {
        "model_id": "test_model_id,
        "model_type": "FINETUNE"
      }
    }
  ]
}
```
After parsing in registering, we will write the following body into index
```
{
  "name": "Test_Agent",
  "type": "flow",
  "relatedModelIds": ["test_model_id,"],
  "tools": [
      {
      "type": "PPLTool",
      "parameters": {
        "model_id": "test_model_id"
      }
    }
  ]
}
```
Then we can use the following DSL template to query:
```
{
  "query": {
    "terms": {
            "relatedModelIds": ["delete_model_id"]
        }
  }
}
```

Pros: We don’t change any API. When develop new tools, we don’t have additional maintenance cost.
Cons: When customer call GET AGENT API, the “relatedModelIds” field will be shown and may confuse customer.

#### Option 2
For basic class Tool, we will implement an attribute  “relatedModelIds”.  All tools can be registered with a parameter:, it’s an optional one. For example, 
```
{
  "name": "Test_Agent_For_RAG_2",
  "type": "flow",
  "tools": [
      {
      "type": "PPLTool",
      "parameters": {
        "model_id": "test_model_id,
        "model_type": "FINETUNE",
        "relatedModelIds": ["test_model_id"]
      }
    }
  ]
}
```
then we can use the following DSL template to query agent index:
```
{
  "query": {
    "terms": {
            "tool.parameters.relatedModelIds": ["delete_model_id"]
        }
  }
}
```
Pros: Relatively low code change. 
Cons: If customer doesn’t provide this in request body, we cannot know which model is used by tool and the solution won’t become effective. It also requires redundant information in the request body, which may confuse users.

#### option 3
We maintain two separate mapping objects in ML-commons and skills. The key is the tool type, and the value is the field name that stores related model IDs. When searching for relevant agents, we iterate through all mappings and construct a DSL for each tool.
For example, we have 
```
{
   "PPLTool": ["model_id"],
   "....."
}
```
Then we would create a DSL for PPLTool like
```
{
  "query": {
    "terms": {
            "tool.parameters.model_id": ["delete_model_id"]
        }
  }
}
```

Pros: It will not impact the user interaction experience.
Cons: High maintenance cost, as each new tool requires adding a corresponding value to the map.






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Check services before deleting MLModel #3191

Background

All related service

Solution

Pipeline:

Agent and Tools

Option 1 (prefer)

Option 2

option 3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Check services before deleting MLModel #3191

Description

Background

All related service

Solution

Pipeline:

Agent and Tools

Option 1 (prefer)

Option 2

option 3

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions