You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the vllm engine of xinference, it is expected to support commonly used inference parameters for easy use.
For example, the maximum model length.
Motivation / 动机
When using a model that xinference already supports, but the default model maximum length cannot be directly loaded due to GPU graphics card resource limitations, the model loading fails. At this time, it can be solved by registering the model, but it is more troublesome. I hope to expose the commonly used vllm inference parameters for easy use.
Feature request / 功能建议
When using the vllm engine of xinference, it is expected to support commonly used inference parameters for easy use.
For example, the maximum model length.
Motivation / 动机
When using a model that xinference already supports, but the default model maximum length cannot be directly loaded due to GPU graphics card resource limitations, the model loading fails. At this time, it can be solved by registering the model, but it is more troublesome. I hope to expose the commonly used vllm inference parameters for easy use.
Your contribution / 您的贡献
The text was updated successfully, but these errors were encountered: