Releases: xorbitsai/inference
v0.13.1
What's new in 0.13.1 (2024-07-12)
These are the changes in inference v0.13.1.
New features
- FEAT: support choose download hub by @amumu96 in #1841
- FEAT: [UI] Specify download hub. by @yiboyasss in #1840
- FEAT: Add support for Flexible Model by @shellc in #1671
Enhancements
- ENH: Update ChatTTS by @codingl2k1 in #1776
- ENH: Added the parameter 'worker_ip' to the 'register' model. by @hainaweiben in #1773
- REF: Remove
chatglm-cpp
and Fix latestllama-cpp-python
issue by @ChengjieLi28 in #1844
Bug fixes
Documentation
Others
- FIX: [UI] Historical parameter echo bugs. by @yiboyasss in #1810
- FIX: [UI] Fix download_hub bugs. by @yiboyasss in #1846
- CHORE: Close issue when it is stale by @ChengjieLi28 in #1827
- CHORE: Update issue template by @ChengjieLi28 in #1833
New Contributors
Full Changelog: v0.13.0...v0.13.1
v0.13.0
What's new in 0.13.0 (2024-07-05)
These are the changes in inference v0.13.0.
New features
Enhancements
- ENH: added gguf files for qwen2 by @qinxuye in #1745
- ENH: Add more log modules by @ChengjieLi28 in #1771
- ENH: Continuous batching supports
vision
model ability by @ChengjieLi28 in #1724 - ENH: Add guard for model launching by @frostyplanet in #1680
- BLD: Supports Aliyun docker image by @ChengjieLi28 in #1753
- BLD: GPU docker use
vllm
image as base by @ChengjieLi28 in #1759 - BLD: Pin
llama-cpp-python
tov0.2.77
in Docker for stability by @ChengjieLi28 in #1767
Bug fixes
- BUG: Fix glm4 tool call by @codingl2k1 in #1747
- BUG: [UI] Fix authentication mode related bugs by @yiboyasss in #1772
- BUG: Fix python client returns documents for rerank task by default by @ChengjieLi28 in #1780
- BUG: Fix LLM based reranker may raise a TypeError by @codingl2k1 in #1794
- BUG: fix deepseek-vl-chat by @qinxuye in #1795
Tests
- TST: Fix
llama-cpp-python
issue in CI by @ChengjieLi28 in #1763
Documentation
- DOC: Update continuous batching and docker usage by @ChengjieLi28 in #1785
Full Changelog: v0.12.3...v0.13.0
v0.12.3
What's new in 0.12.3 (2024-06-28)
These are the changes in inference v0.12.3.
New features
- FEAT: [UI] Add favorite function. by @yiboyasss in #1714
- FEAT: add SD3 support by @qinxuye in #1723
- FEAT: [UI] Add the function of automatically obtaining the last configuration information. by @yiboyasss in #1730
- FEAT: support jina-rerank-v2 by @qinxuye in #1733
- FEAT:
tensorizer
integration by @Zihann73 in #1579 - FEAT: Delete cluster by @hainaweiben in #1719
Enhancements
- ENH: Set the CSG Hub endpoint as an environment variable. by @hainaweiben in #1666
- BLD: pin
chatglm-cpp
versionv0.3.x
by @ChengjieLi28 in #1692
Bug fixes
- BUG: [UI] Fix deleting prompt_style when Model Family is other. by @yiboyasss in #1707
- BUG: GGUF models cannot use GPU in docker by @ChengjieLi28 in #1710
- BUG: Fix tool call observation by @codingl2k1 in #1648
- BUG: [UI]fix favorite bug. by @yiboyasss in #1728
- BUG: curl with stream returns unicode chars rather than chinese character by @ChengjieLi28 in #1732
- BUG: Cluster info can be accessed without authorization in the auth mode by @ChengjieLi28 in #1731
Others
New Contributors
Full Changelog: v0.12.2...v0.12.3
v0.12.2.post1
What's new in 0.12.2.post1 (2024-06-22)
These are the changes in inference v0.12.2.post1.
Enhancements
- BLD: pin
chatglm-cpp
versionv0.3.x
by @ChengjieLi28 in #1692
Full Changelog: v0.12.2...v0.12.2.post1
v0.12.2
What's new in 0.12.2 (2024-06-21)
These are the changes in inference v0.12.2.
New features
- FEAT: Add Tools Support for Qwen Series MOE Models by @zhanghx0905 in #1642
- FEAT: [UI]Modify the deletion function of a custom model. by @yiboyasss in #1656
- FEAT: [UI]Custom model presents JSON data and modifies it. by @yiboyasss in #1670
- FEAT: Add Rerank model token input/output usage by @wxiwnd in #1657
Enhancements
- ENH: Continuous batching supports all the models with
transformers
backend by @ChengjieLi28 in #1659
Bug fixes
- BUG: show error when user launch quantized model without device supported by @Minamiyama in #1645
- BUG: Fix default rerank type by @codingl2k1 in #1649
- BUG: chat_completion not response while error appears more than 100 by @liuzhenghua in #1663
Tests
- TST: Fix CI due to
tenacity
by @ChengjieLi28 in #1660
Others
- CHORE: [pre-commit] Add exclude thirdparty rules by @frostyplanet in #1678
Full Changelog: v0.12.1...v0.12.2
v0.12.1
What's new in 0.12.1 (2024-06-14)
These are the changes in inference v0.12.1.
New features
- FEAT: qwen2-instruct support tool call by @ayhhyhh in #1631
- FEAT: Added a method to download models from csghub. by @hainaweiben in #1627
- FEAT: glm4-chat support tool call by @codingl2k1 in #1617
- FEAT: [UI] Supports viewing and deleting cache data. by @yiboyasss in #1637
Enhancements
- ENH: modelscope for audio models by @Minamiyama in #1607
- ENH: Supports
generate
interface for continuous batching by @ChengjieLi28 in #1621 - ENH: quantization for glm-4v by @Minamiyama in #1610
Bug fixes
- BUG: Fix wheel package missing thirdparty ChatTTS by @codingl2k1 in #1606
- BUG: fix XINFERENCE_MODEL_SRC behavior by @LukeWang-Plus in #1616
- BUG: Filtering Step for Streaming Responses to Qwen's Tool Calls when using vLLM by @zhanghx0905 in #1598
Others
- Remove selected cache models by @hainaweiben in #1613
New Contributors
- @LukeWang-Plus made their first contribution in #1616
- @ayhhyhh made their first contribution in #1631
Full Changelog: v0.12.0...v0.12.1
v0.12.0
What's new in 0.12.0 (2024-06-07)
These are the changes in inference v0.12.0.
New features
- FEAT: new model: mini-cpm-llama3-v-2.5 by @Minamiyama in #1577
- FEAT: support glm4-chat & glm4-chat-1m by @qinxuye in #1584
- FEAT: add mistral-instruct-v0.3 by @qinxuye in #1576
- FEAT: add codestral-v0.1 by @qinxuye in #1575
- FEAT: Support ChatTTS by @codingl2k1 in #1578
- FEAT: Continuous batching for chat model on transformers backend by @ChengjieLi28 in #1548
- FEAT: support qwen2 by @qinxuye in #1597
- Feat: support glm-4v 9b by @Minamiyama in #1591
Enhancements
- ENH: make CogVLM2 support stream output by @Minamiyama in #1572
- BLD: Docker clean all images after building image on self-hosted machine by @ChengjieLi28 in #1595
- BLD: Fix pip is looking multiple versions of some packages while installing by @ChengjieLi28 in #1603
Bug fixes
- BUG: Fix typo for cogvlm2 by @Minamiyama in #1573
Documentation
- DOC: added new models in README by @qinxuye in #1585
- DOC: Fix audio doc by @codingl2k1 in #1593
- DOC: Usage about cal-model-memory by @wxiwnd in #1589
- DOC: Fix audio doc by @codingl2k1 in #1599
- DOC: Continuous batching by @ChengjieLi28 in #1602
- DOC: add new models to readme by @qinxuye in #1604
New Contributors
Full Changelog: v0.11.3...v0.12.0
v0.11.3
What's new in 0.11.3 (2024-05-31)
These are the changes in inference v0.11.3.
New features
- FEAT: support Yi-1.5-chat-16k by @qinxuye in #1544
- FEAT: Support XINFERENCE_DISABLE_METRICS env by @codingl2k1 in #1547
- FEAT: Support new model CogVLM by @amumu96 in #1551
- FEAT: telechat model by @LIKEGAKKI in #1567
Enhancements
- ENH: added engines options to model launch details by @qinxuye in #1546
- ENH: rm mini-internvl by @amumu96 in #1563
- ENH: add additional_option at vl gradio by @amumu96 in #1561
- ENH: add real paths column by @hainaweiben in #1555
Bug fixes
- BUG: fix launch model error when use torch 2.3.0 by @amumu96 in #1543
- BUG: fix vl-model img path error by @amumu96 in #1559
- BUG: Fix validation errors when define a custom baichuan-chat LLM model by @buptzyf in #1557
Documentation
Others
- Correct ModelActor import path in worker & supervisor by @frostyplanet in #1550
New Contributors
- @buptzyf made their first contribution in #1557
- @LIKEGAKKI made their first contribution in #1567
Full Changelog: v0.11.2...v0.11.3
v0.11.2.post1
What's new in 0.11.2.post1 (2024-05-24)
These are the changes in inference v0.11.2.post1, a hotfix version of v0.11.2.
Bug fixes
Full Changelog: v0.11.2...v0.11.2.post1
v0.11.2
What's new in 0.11.2 (2024-05-24)
These are the changes in inference v0.11.2.
New features
- FEAT: Add command cal-model-mem by @frostyplanet in #1460
- FEAT: add deepseek llm and coder base by @qinxuye in #1533
- FEAT: add codeqwen1.5 by @qinxuye in #1535
- FEAT: Auto detect rerank type for unknown rerank type by @codingl2k1 in #1538
- FEAT: Provide the functionality to query information on various cached models hosted on the query node. by @hainaweiben in #1522
Enhancements
- ENH: Compatible with
huggingface-hub
v0.23.0
by @ChengjieLi28 in #1514 - ENH: convert command-r to chat by @qinxuye in #1537
- ENH: Support Intern-VL-Chat model by @amumu96 in #1536
- BLD: adapt to langchain 0.2.x, which has breaking changes by @mikeshi80 in #1521
- BLD: Fix pre commit by @frostyplanet in #1527
- BLD: compatible with torch 2.3.0 by @qinxuye in #1534
Bug fixes
- BUG: Fix start worker failed due to None device name by @codingl2k1 in #1539
- BUG: Fix gpu_idx allocate error when set replica > 1 by @amumu96 in #1528
Others
- CHORE: Basic benchmark/benchmark_rerank.py by @codingl2k1 in #1479
Full Changelog: v0.11.1...v0.11.2