Releases: xorbitsai/inference
Releases · xorbitsai/inference
v0.9.1
What's new in 0.9.1 (2024-03-01)
These are the changes in inference v0.9.1.
New features
- FEAT: Docker for cpu only by @ChengjieLi28 in #1068
Enhancements
- ENH: Support downloading gemma from modelscope by @aresnow1 in #1035
- ENH: [UI] Setting
quantization
when registering LLM by @ChengjieLi28 in #1040 - ENH: Restful client supports multiple system prompts for chat by @ChengjieLi28 in #1056
- ENH: supports disabling worker reporting status by @ChengjieLi28 in #1057
- ENH: Extra params for
xinference launch
command line by @ChengjieLi28 in #1048
Bug fixes
- BUG: Fix some models that cannot download from
modelscope
by @ChengjieLi28 in #1066 - BUG: Fix early truncation due to
max_token
being default to16
instead of1024
by @ZhangTianrong in #1061
Documentation
- DOC: Update readme by @qinxuye in #1045
- DOC: Fix readme by @qinxuye in #1054
- DOC: Fix wechat links by @qinxuye in #1055
New Contributors
- @ZhangTianrong made their first contribution in #1061
Full Changelog: v0.9.0...v0.9.1
v0.9.0
What's new in 0.9.0 (2024-02-22)
These are the changes in inference v0.9.0.
New features
- FEAT: Refactor device related code and add initial Intel GPU support by @notsyncing in #968
- FEAT: Support gemma series model by @aresnow1 in #1024
Enhancements
- ENH: [UI] Supports
replica
when launching LLM models by @ChengjieLi28 in #1011 - ENH: [UI] Show cluster resource information by @ChengjieLi28 in #1015
Bug fixes
- BUG: fix chat completion error when indexing body.messages by @fffonion in #1008
- BUG: Fix cache sd 1.5 error by @codingl2k1 in #1013
- BUG: fix typo in modelscope llama-2-13b-chat-GGUF by @qinxuye in #1026
- BUG: Fix missing qwen 1.5 7b gguf by @codingl2k1 in #1027
Documentation
- DOC: Polish model operation command doc by @onesuper in #1000
- DOC: Fix note on secret_key generation and algorithm selection for OAuth2 by @ChengjieLi28 in #1012
New Contributors
- @fffonion made their first contribution in #1008
- @notsyncing made their first contribution in #968
Full Changelog: v0.8.5...v0.9.0
v0.8.5
What's new in 0.8.5 (2024-02-06)
These are the changes in inference v0.8.5.
New features
- FEAT: Implemented web UI for launching the text2image model. by @hainaweiben in #985
- FEAT: Support qwen-1.5 series by @aresnow1 in #994
Enhancements
- ENH: Download stable diffusion model from modelscope by @codingl2k1 in #980
- REF: Supports
pydantic
v2 by @ChengjieLi28 in #983
Bug fixes
- BUG: Fix load yi vl model to multiple cards by @codingl2k1 in #992
- BUG: client compatible with old version of xinference by @ChengjieLi28 in #987
Others
New Contributors
- @hainaweiben made their first contribution in #985
Full Changelog: v0.8.4...v0.8.5
v0.8.4
What's new in 0.8.4 (2024-02-04)
These are the changes in inference v0.8.4.
Enhancements
- ENH: [UI] Fix too long LLM model name by @ChengjieLi28 in #979
- ENH: Add gguf models of llama-2-chat by @aresnow1 in #981
Bug fixes
- BUG: Fix custom model tool calls by @codingl2k1 in #978
- BUG: Fix chat template by @aresnow1 in #977
Documentation
- DOC: Translate model docs by @onesuper in #965
- DOC: Auto gen metrics doc by @codingl2k1 in #967
- DOC: Update README.md by @codingl2k1 in #969
Full Changelog: v0.8.3.1...v0.8.4
v0.8.3.1
What's new in 0.8.3.1 (2024-02-02)
These are the changes in inference v0.8.3.1.
Bug fixes
- BUG: Remove flash-attn dependency by @codingl2k1 in #970
Full Changelog: v0.8.3...v0.8.3.1
v0.8.3
What's new in 0.8.3 (2024-02-02)
These are the changes in inference v0.8.3.
New features
- FEAT: add whisper.small and belle distilwhisper model, fix parameter in rerank by @zhanghx0905 in #944
- FEAT: Support jina-embeddings-v2-base-zh by @aresnow1 in #948
- FEAT: Support Yi VL by @codingl2k1 in #946
- FEAT: Support more embedding and rerank models by @aresnow1 in #959
Enhancements
- ENH: Record gpu mem status in workers by @ChengjieLi28 in #941
- ENH: Allow chat max_tokens is None by @codingl2k1 in #960
- ENH:
chatglm
ggml
format supportssystem_prompt
by @ChengjieLi28 in #962
Bug fixes
- BUG: Fix roles in chat UI by @aresnow1 in #949
- BUG: Fix heartbeat by @codingl2k1 in #957
- BUG: Fix model's content length by @aresnow1 in #955
Documentation
- DOC: Update readme by @aresnow1 in #938
- DOC: Add image model doc by @codingl2k1 in #947
- DOC: Add audio model doc by @codingl2k1 in #954
- DOC: Reorge model related docs by @onesuper in #961
New Contributors
- @zhanghx0905 made their first contribution in #944
Full Changelog: v0.8.2...v0.8.3
v0.8.2
What's new in 0.8.2 (2024-01-26)
These are the changes in inference v0.8.2.
New features
- FEAT: Support events by @codingl2k1 in #916
- FEAT: Support audio model by @codingl2k1 in #929
- FEAT: Support orion series models by @aresnow1 in #933
- Feat: Support Mixtral-8x7B-Instruct-v0.1-AWQ by @aresnow1 in #936
Enhancements
- ENH: Launch model by
version
by @ChengjieLi28 in #896 - ENH: Move multimodal to LLM by @codingl2k1 in #917
- ENH: InternLM2 chat template by @aresnow1 in #919
- ENH: Support
use_fp16
for rerank model by @aresnow1 in #927 - ENH: record instance count and version count when detailed listing model registrations by @ChengjieLi28 in #920
- BLD: Resolve conflicts during installation by @aresnow1 in #924
- REF: Move auth code to service for better scalability by @ChengjieLi28 in #925
Documentation
- DOC: Update readme by @aresnow1 in #914
- DOC: Display contributors in readme by @onesuper in #915
- DOC: Merge multimodal to LLM by @codingl2k1 in #923
- DOC: Model usage guide by @onesuper in #926
- DOC: Audio doc by @codingl2k1 in #937
Full Changelog: v0.8.1...v0.8.2
v0.8.1
What's new in 0.8.1 (2024-01-19)
These are the changes in inference v0.8.1.
New features
- FEAT: Auto recover limit by @codingl2k1 in #893
- FEAT: Prometheus metrics exporter by @codingl2k1 in #906
- FEAT: Add internlm2-chat support by @aresnow1 in #913
Enhancements
- ENH: Launch model asynchronously by @ChengjieLi28 in #879
- ENH: qwen vl modelscope by @codingl2k1 in #902
- ENH: Add "tools" in model ability by @aresnow1 in #904
- ENH: Add quantization support for qwen chat by @aresnow1 in #910
Bug fixes
- BUG: Fix prompt template of chatglm3-32k by @aresnow1 in #889
- BUG: invalid volumn in docker compose yml by @ChengjieLi28 in #890
- BUG: Revert #883 by @aresnow1 in #903
- BUG: Fix chatglm backend by @codingl2k1 in #898
- BUG: Fix tool calls on custom model by @codingl2k1 in #899
- BUG: Fix is_valid_model_name by @aresnow1 in #907
Documentation
- DOC: Update the documentation about use of docker by @aresnow1 in #901
- DOC:ADD FAQ IN troubleshooting.rst by @sisuad in #911
New Contributors
Full Changelog: v0.8.0...v0.8.1
v0.8.0
What's new in 0.8.0 (2024-01-11)
These are the changes in inference v0.8.0.
New features
- FEAT: qwen 1.8b gptq by @codingl2k1 in #869
- FEAT: docker compose support by @Minamiyama in #868
- FEAT: Simple OAuth2 system by @ChengjieLi28 in #793
- FEAT: Chat vl web UI by @codingl2k1 in #882
- FEAT: Yi chat gptq by @codingl2k1 in #876
Enhancements
- ENH: Stream use xoscar generator by @codingl2k1 in #859
- ENH: UI supports registering custom
gptq
models by @ChengjieLi28 in #875 - ENH: make the size param of *_to_image more compatible by @liunux4odoo in #881
- BLD: Update package-lock.json by @aresnow1 in #886
- REF: Add
model_hub
property inEmbeddingModelSpec
by @aresnow1 in #877
Bug fixes
- BUG: Fix image model b64_json output by @codingl2k1 in #874
- BUG: fix libcuda.so.1: cannot open shared object file by @superhuahua in #883
- BUG: Fix auto recover kwargs by @codingl2k1 in #885
Documentation
- DOC: docker image translation by @aresnow1 in #865
- DOC: register model with
model_family
by @ChengjieLi28 in #863 - DOC: Add OpenAI Client API doc by @codingl2k1 in #864
- DOC: add docker instructions by @onesuper in #878
New Contributors
- @superhuahua made their first contribution in #883
Full Changelog: v0.7.5...v0.8.0
v0.7.5
What's new in 0.7.5 (2024-01-05)
These are the changes in inference v0.7.5.
New features
- FEAT: text2vec by @ChengjieLi28 in #857
Enhancements
- ENH: Offload all response serialization to ModelActor by @codingl2k1 in #837
- ENH: Custom model uses vLLM by @ChengjieLi28 in #861
- BLD: Docker image by @ChengjieLi28 in #855
Bug fixes
- BUG: Fix typing_extension version problem in notebook by @onesuper in #856
- BUG: Fix multimodal cmdline by @codingl2k1 in #850
- BUG: Fix generate of chatglm3 by @aresnow1 in #858
Documentation
- DOC: CUDA Version recommendation by @ChengjieLi28 in #841
- DOC: new doc cover by @onesuper in #843
- DOC: Autogen modelhub info by @onesuper in #845
- DOC: Add multimodal feature in README by @onesuper in #846
- DOC: Chinese doc for user guide by @aresnow1 in #847
- DOC: add notebook for quickstart by @onesuper in #854
- DOC: Add docs about environments by @aresnow1 in #853
- DOC: Add jupyter notebook quick start tutorial by @onesuper in #851
Others
- CHORE: Add docker image with
latest
tag by @ChengjieLi28 in #862
Full Changelog: v0.7.4.1...v0.7.5