-
Notifications
You must be signed in to change notification settings - Fork 201
Multiple models support for LLM TGI #835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…l field for ChatQnAGateway and LLMParams respectively
…els. Uses load_model_configs method from utils
…or different models
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
|
Signed-off-by: sgurunat <[email protected]>
I am confused by this PR. why we want user to pass a model_config to support different models? each OPEA microservice's instance will only support 1 model during deployment, The model_id will not be changed. and the endpoint is not configurable, it's predefined in OPEA API spec which is openai API compatible I don't think this is right requirement to support changing different models during inference request. |
Signed-off-by: sgurunat <[email protected]>
after discussion, we think having such model switch capability in s/w layer is right way to go. just the model_config format may be able to revise for simplicity. it could be done in future PR. |
@sgurunat, |
@sgurunat, |
for more information, see https://pre-commit.ci
@lvliang-intel - resolved the merge conflicts |
* Update gateway and docarray from mega and proto services to have model field for ChatQnAGateway and LLMParams respectively * Add load_model_configs method in utils.py to validate and load the model_configs * Update llms text-generation tgi file (llm.py) to support multiple models. Uses load_model_configs method from utils * Update llms text-generation tgi template to add different templates for different models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed llm_endpoint empty string issue on error scenario Signed-off-by: sgurunat <[email protected]> * Function to get llm_endpoint and keep the code clean Signed-off-by: sgurunat <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: sgurunat <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description
To support multiple llm models for ChatQnA, the changes are incorporated into llms TGI text-generation. Multiple models can be provided in model_configs.json file which will be loaded into MODEL_CONFIGS environment variable.
Type of change
New feature (non-breaking change which adds new functionality)
##Changes
To support this the model parameter has been added in the ChatQnAGateway and LLMParams from gateway.py and docarray.py respectively.
Added load_model_configs method in utils.py to validate all the required fields ( 'model_name', 'displayName', 'endpoint', 'minToken', 'maxToken') and then load the configurations. This is added in utils so that it can be reused.
Updated llm.py from llms text-generation tgi to support multiple models and transfer the call to right endpoint.
Updated the template.py file from llms text-generation tgi to have new template for models meta-llama/Meta-Llama-3.1-70B-Instruct" and "meta-llama/Meta-Llama-3.1-8B-Instruct"