12 Jul 11:10

XprobeBot

5e3f254

v0.13.1

What's new in 0.13.1 (2024-07-12)

These are the changes in inference v0.13.1.

New features

FEAT: support choose download hub by @amumu96 in #1841
FEAT: [UI] Specify download hub. by @yiboyasss in #1840
FEAT: Add support for Flexible Model by @shellc in #1671

Enhancements

ENH: Update ChatTTS by @codingl2k1 in #1776
ENH: Added the parameter 'worker_ip' to the 'register' model. by @hainaweiben in #1773
REF: Remove chatglm-cpp and Fix latest llama-cpp-python issue by @ChengjieLi28 in #1844

Bug fixes

BUG: cache status missing for model id with quantization placeholder by @Zihann73 in #1849

Documentation

DOC: Define a custom Rerank model by @Weaxs in #1821
DOC: update readme by @qinxuye in #1815

Others

FIX: [UI] Historical parameter echo bugs. by @yiboyasss in #1810
FIX: [UI] Fix download_hub bugs. by @yiboyasss in #1846
CHORE: Close issue when it is stale by @ChengjieLi28 in #1827
CHORE: Update issue template by @ChengjieLi28 in #1833

New Contributors

@Weaxs made their first contribution in #1821
@shellc made their first contribution in #1671

Full Changelog: v0.13.0...v0.13.1

Contributors

qinxuye, shellc, and 7 other contributors

Assets 2

05 Jul 10:33

XprobeBot

v0.13.0

007408c

v0.13.0

What's new in 0.13.0 (2024-07-05)

These are the changes in inference v0.13.0.

New features

FEAT: support MLX engine by @qinxuye in #1765
FEAT: add gemma-2-it by @qinxuye in #1774

Enhancements

ENH: added gguf files for qwen2 by @qinxuye in #1745
ENH: Add more log modules by @ChengjieLi28 in #1771
ENH: Continuous batching supports vision model ability by @ChengjieLi28 in #1724
ENH: Add guard for model launching by @frostyplanet in #1680
BLD: Supports Aliyun docker image by @ChengjieLi28 in #1753
BLD: GPU docker use vllm image as base by @ChengjieLi28 in #1759
BLD: Pin llama-cpp-python to v0.2.77 in Docker for stability by @ChengjieLi28 in #1767

Bug fixes

BUG: Fix glm4 tool call by @codingl2k1 in #1747
BUG: [UI] Fix authentication mode related bugs by @yiboyasss in #1772
BUG: Fix python client returns documents for rerank task by default by @ChengjieLi28 in #1780
BUG: Fix LLM based reranker may raise a TypeError by @codingl2k1 in #1794
BUG: fix deepseek-vl-chat by @qinxuye in #1795

Tests

TST: Fix llama-cpp-python issue in CI by @ChengjieLi28 in #1763

Documentation

DOC: Update continuous batching and docker usage by @ChengjieLi28 in #1785

Full Changelog: v0.12.3...v0.13.0

Contributors

frostyplanet, qinxuye, and 3 other contributors

Assets 2

28 Jun 07:36

XprobeBot

v0.12.3

3d9c261

v0.12.3

What's new in 0.12.3 (2024-06-28)

These are the changes in inference v0.12.3.

New features

FEAT: [UI] Add favorite function. by @yiboyasss in #1714
FEAT: add SD3 support by @qinxuye in #1723
FEAT: [UI] Add the function of automatically obtaining the last configuration information. by @yiboyasss in #1730
FEAT: support jina-rerank-v2 by @qinxuye in #1733
FEAT: tensorizer integration by @Zihann73 in #1579
FEAT: Delete cluster by @hainaweiben in #1719

Enhancements

ENH: Set the CSG Hub endpoint as an environment variable. by @hainaweiben in #1666
BLD: pin chatglm-cpp version v0.3.x by @ChengjieLi28 in #1692

Bug fixes

BUG: [UI] Fix deleting prompt_style when Model Family is other. by @yiboyasss in #1707
BUG: GGUF models cannot use GPU in docker by @ChengjieLi28 in #1710
BUG: Fix tool call observation by @codingl2k1 in #1648
BUG: [UI]fix favorite bug. by @yiboyasss in #1728
BUG: curl with stream returns unicode chars rather than chinese character by @ChengjieLi28 in #1732
BUG: Cluster info can be accessed without authorization in the auth mode by @ChengjieLi28 in #1731

Others

CHORE: upgrade version fix security vulnerability by @rickywu in #1674

New Contributors

@Zihann73 made their first contribution in #1579
@rickywu made their first contribution in #1674

Full Changelog: v0.12.2...v0.12.3

Contributors

qinxuye, rickywu, and 5 other contributors

Assets 2

22 Jun 17:37

XprobeBot

v0.12.2.post1

7705d4a

v0.12.2.post1

What's new in 0.12.2.post1 (2024-06-22)

These are the changes in inference v0.12.2.post1.

Enhancements

BLD: pin chatglm-cpp version v0.3.x by @ChengjieLi28 in #1692

Full Changelog: v0.12.2...v0.12.2.post1

Contributors

ChengjieLi28

Assets 2

21 Jun 09:14

XprobeBot

v0.12.2

5cef7c3

v0.12.2

What's new in 0.12.2 (2024-06-21)

These are the changes in inference v0.12.2.

New features

FEAT: Add Tools Support for Qwen Series MOE Models by @zhanghx0905 in #1642
FEAT: [UI]Modify the deletion function of a custom model. by @yiboyasss in #1656
FEAT: [UI]Custom model presents JSON data and modifies it. by @yiboyasss in #1670
FEAT: Add Rerank model token input/output usage by @wxiwnd in #1657

Enhancements

ENH: Continuous batching supports all the models with transformers backend by @ChengjieLi28 in #1659

Bug fixes

BUG: show error when user launch quantized model without device supported by @Minamiyama in #1645
BUG: Fix default rerank type by @codingl2k1 in #1649
BUG: chat_completion not response while error appears more than 100 by @liuzhenghua in #1663

Tests

TST: Fix CI due to tenacity by @ChengjieLi28 in #1660

Others

CHORE: [pre-commit] Add exclude thirdparty rules by @frostyplanet in #1678

Full Changelog: v0.12.1...v0.12.2

Contributors

frostyplanet, Minamiyama, and 6 other contributors

Assets 2

14 Jun 09:31

XprobeBot

v0.12.1

34a57df

v0.12.1

What's new in 0.12.1 (2024-06-14)

These are the changes in inference v0.12.1.

New features

FEAT: qwen2-instruct support tool call by @ayhhyhh in #1631
FEAT: Added a method to download models from csghub. by @hainaweiben in #1627
FEAT: glm4-chat support tool call by @codingl2k1 in #1617
FEAT: [UI] Supports viewing and deleting cache data. by @yiboyasss in #1637

Enhancements

ENH: modelscope for audio models by @Minamiyama in #1607
ENH: Supports generate interface for continuous batching by @ChengjieLi28 in #1621
ENH: quantization for glm-4v by @Minamiyama in #1610

Bug fixes

BUG: Fix wheel package missing thirdparty ChatTTS by @codingl2k1 in #1606
BUG: fix XINFERENCE_MODEL_SRC behavior by @LukeWang-Plus in #1616
BUG: Filtering Step for Streaming Responses to Qwen's Tool Calls when using vLLM by @zhanghx0905 in #1598

Others

Remove selected cache models by @hainaweiben in #1613

New Contributors

@LukeWang-Plus made their first contribution in #1616
@ayhhyhh made their first contribution in #1631

Full Changelog: v0.12.0...v0.12.1

Contributors

Minamiyama, LukeWang-Plus, and 6 other contributors

Assets 2

07 Jun 07:27

XprobeBot

v0.12.0

55c5636

v0.12.0

What's new in 0.12.0 (2024-06-07)

These are the changes in inference v0.12.0.

New features

FEAT: new model: mini-cpm-llama3-v-2.5 by @Minamiyama in #1577
FEAT: support glm4-chat & glm4-chat-1m by @qinxuye in #1584
FEAT: add mistral-instruct-v0.3 by @qinxuye in #1576
FEAT: add codestral-v0.1 by @qinxuye in #1575
FEAT: Support ChatTTS by @codingl2k1 in #1578
FEAT: Continuous batching for chat model on transformers backend by @ChengjieLi28 in #1548
FEAT: support qwen2 by @qinxuye in #1597
Feat: support glm-4v 9b by @Minamiyama in #1591

Enhancements

ENH: make CogVLM2 support stream output by @Minamiyama in #1572
BLD: Docker clean all images after building image on self-hosted machine by @ChengjieLi28 in #1595
BLD: Fix pip is looking multiple versions of some packages while installing by @ChengjieLi28 in #1603

Bug fixes

BUG: Fix typo for cogvlm2 by @Minamiyama in #1573

Documentation

DOC: added new models in README by @qinxuye in #1585
DOC: Fix audio doc by @codingl2k1 in #1593
DOC: Usage about cal-model-memory by @wxiwnd in #1589
DOC: Fix audio doc by @codingl2k1 in #1599
DOC: Continuous batching by @ChengjieLi28 in #1602
DOC: add new models to readme by @qinxuye in #1604

New Contributors

@wxiwnd made their first contribution in #1589

Full Changelog: v0.11.3...v0.12.0

Contributors

qinxuye, Minamiyama, and 3 other contributors

Assets 2

31 May 09:28

XprobeBot

v0.11.3

69c09cd

v0.11.3

What's new in 0.11.3 (2024-05-31)

These are the changes in inference v0.11.3.

New features

FEAT: support Yi-1.5-chat-16k by @qinxuye in #1544
FEAT: Support XINFERENCE_DISABLE_METRICS env by @codingl2k1 in #1547
FEAT: Support new model CogVLM by @amumu96 in #1551
FEAT: telechat model by @LIKEGAKKI in #1567

Enhancements

ENH: added engines options to model launch details by @qinxuye in #1546
ENH: rm mini-internvl by @amumu96 in #1563
ENH: add additional_option at vl gradio by @amumu96 in #1561
ENH: add real paths column by @hainaweiben in #1555

Bug fixes

BUG: fix launch model error when use torch 2.3.0 by @amumu96 in #1543
BUG: fix vl-model img path error by @amumu96 in #1559
BUG: Fix validation errors when define a custom baichuan-chat LLM model by @buptzyf in #1557

Documentation

DOC: update readme and fix description about model engine by @qinxuye in #1566

Others

Correct ModelActor import path in worker & supervisor by @frostyplanet in #1550

New Contributors

@buptzyf made their first contribution in #1557
@LIKEGAKKI made their first contribution in #1567

Full Changelog: v0.11.2...v0.11.3

Contributors

frostyplanet, qinxuye, and 5 other contributors

Assets 2

24 May 11:52

XprobeBot

v0.11.2.post1

ac8f334

v0.11.2.post1

What's new in 0.11.2.post1 (2024-05-24)

These are the changes in inference v0.11.2.post1, a hotfix version of v0.11.2.

Bug fixes

BUG: fix launch model error when use torch 2.3.0 by @amumu96 in #1543

Full Changelog: v0.11.2...v0.11.2.post1

Contributors

amumu96

Assets 2

24 May 09:10

XprobeBot

v0.11.2

77e79f8

v0.11.2

What's new in 0.11.2 (2024-05-24)

These are the changes in inference v0.11.2.

New features

FEAT: Add command cal-model-mem by @frostyplanet in #1460
FEAT: add deepseek llm and coder base by @qinxuye in #1533
FEAT: add codeqwen1.5 by @qinxuye in #1535
FEAT: Auto detect rerank type for unknown rerank type by @codingl2k1 in #1538
FEAT: Provide the functionality to query information on various cached models hosted on the query node. by @hainaweiben in #1522

Enhancements

ENH: Compatible with huggingface-hub v0.23.0 by @ChengjieLi28 in #1514
ENH: convert command-r to chat by @qinxuye in #1537
ENH: Support Intern-VL-Chat model by @amumu96 in #1536
BLD: adapt to langchain 0.2.x, which has breaking changes by @mikeshi80 in #1521
BLD: Fix pre commit by @frostyplanet in #1527
BLD: compatible with torch 2.3.0 by @qinxuye in #1534

Bug fixes

BUG: Fix start worker failed due to None device name by @codingl2k1 in #1539
BUG: Fix gpu_idx allocate error when set replica > 1 by @amumu96 in #1528

Others

CHORE: Basic benchmark/benchmark_rerank.py by @codingl2k1 in #1479

Full Changelog: v0.11.1...v0.11.2

Contributors

frostyplanet, qinxuye, and 5 other contributors

Assets 2

Releases: xorbitsai/inference

v0.13.1

What's new in 0.13.1 (2024-07-12)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

v0.13.0

What's new in 0.13.0 (2024-07-05)

New features

Enhancements

Bug fixes

Tests

Documentation

Contributors

v0.12.3

What's new in 0.12.3 (2024-06-28)

New features

Enhancements

Bug fixes

Others

New Contributors

Contributors

v0.12.2.post1

What's new in 0.12.2.post1 (2024-06-22)

Enhancements

Contributors

v0.12.2

What's new in 0.12.2 (2024-06-21)

New features

Enhancements

Bug fixes

Tests

Others

Contributors

v0.12.1

What's new in 0.12.1 (2024-06-14)

New features

Enhancements

Bug fixes

Others

New Contributors

Contributors

v0.12.0

What's new in 0.12.0 (2024-06-07)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v0.11.3

What's new in 0.11.3 (2024-05-31)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

v0.11.2.post1

What's new in 0.11.2.post1 (2024-05-24)

Bug fixes

Contributors

v0.11.2

What's new in 0.11.2 (2024-05-24)

New features

Enhancements

Bug fixes

Others

Contributors