24 Jan 08:59

XprobeBot

a57b99b

v1.2.1 Latest

Latest

What's new in 1.2.1 (2025-01-24)

These are the changes in inference v1.2.1.

New features

FEAT: Support MeloTTS by @codingl2k1 in #2760
FEAT: support deepseek-r1-distill-qwen by @qinxuye in #2781

Enhancements

ENH: add model config for Whisper by @fonsc in #2755
ENH: support cline style messages for all backend engines by @liunux4odoo in #2763
ENH: CosyVoice2 support SFT speakers by @codingl2k1 in #2770
ENH: Some improvements for Xavier by @ChengjieLi28 in #2777

Bug fixes

BUG: Compat with openai extra body by @codingl2k1 in #2759

Tests

TST: compatible with mlx-vlm 0.1.11 by @qinxuye in #2769

Documentation

DOC: update new models in README and doc by @qinxuye in #2761
DOC: using discord instead of slack & updating model to qwen2.5 in getting started doc by @qinxuye in #2775

Others

FIX: [UI] normalize language input to ensure consistent array format. by @yiboyasss in #2771

New Contributors

@fonsc made their first contribution in #2755

Full Changelog: v1.2.0...v1.2.1

Contributors

qinxuye, fonsc, and 4 other contributors

Assets 2

10 Jan 09:34

XprobeBot

v1.2.0

df45f11

v1.2.0

What's new in 1.2.0 (2025-01-10)

These are the changes in inference v1.2.0.

New features

FEAT: support HunyuanVideo by @qinxuye in #2721
FEAT: support hunyuan-dit text2image by @qinxuye in #2727
FEAT: support cline for vllm engine by @hwzhuhao in #2734
FEAT: [UI] theme switch by @Minamiyama in #1335
FEAT: support qwen2vl run on ascend npu by @Xu-pixel in #2741
FEAT: [UI] Add language toggle for i18n support. by @yiboyasss in #2744
FEAT: Support cogagent-9b by @amumu96 in #2740
FEAT: Xavier: Share KV cache between VLLM replicas by @ChengjieLi28 in #2732
FEAT: [UI] Add gguf_quantization, gguf_model_path, and cpu_offload for image models. by @yiboyasss in #2753
FEAT: Support Marco-o1 by @Jun-Howie in #2749

Enhancements

ENH: [UI] Update Button Style and Interaction Logic for Editing Cache in Model Card. by @yiboyasss in #2746
ENH: Improve error message by @codingl2k1 in #2738

Bug fixes

BUG: adapt mlx-vlm v0.1.7 by @qinxuye in #2724
BUG: pin mlx<0.22.0 to prevent qwen2_vl failing in mlx-vlm by @qinxuye in #2752

Others

FIX: [UI] Resolve bug preventing '/' input in model_path. by @yiboyasss in #2747
FIX: [UI] Fix dark mode background bug. by @yiboyasss in #2748
CHORE: Update new models in readme by @codingl2k1 in #2713

New Contributors

@Xu-pixel made their first contribution in #2741

Full Changelog: v1.1.1...v1.2.0

Contributors

qinxuye, Minamiyama, and 7 other contributors

Assets 2

27 Dec 10:21

XprobeBot

v1.1.1

d342869

v1.1.1

What's new in 1.1.1 (2024-12-27)

These are the changes in inference v1.1.1.

New features

FEAT: support F5-TTS-MLX by @qinxuye in #2671
FEAT: Support qwen2.5-coder-instruct model for tool calls by @Timmy-web in #2681
FEAT: Support minicpm-4B on vllm by @Jun-Howie in #2697
FEAT: support scheduling-policy for vllm by @hwzhuhao in #2700
FEAT: Support QvQ-72B-Preview by @Jun-Howie in #2712
FEAT: support SD3.5 series model by @qinxuye in #2706

Enhancements

ENH: Guided Decoding OpenAIClient compatibility by @wxiwnd in #2673
ENH: resample f5-tts-mlx ref audio when sample rate not synching. by @qinxuye in #2678
ENH: support no images for MLX vlm by @qinxuye in #2670
ENH: Update fish speech 1.5 by @codingl2k1 in #2672
ENH: Update cosyvoice 2 by @codingl2k1 in #2684
REF: Reduce code redundancy by setting default values by @pengjunfeng11 in #2711

Bug fixes

BUG: Fix f5tts audio ref by @codingl2k1 in #2680
BUG: glm4-chat cannot apply for continuous batching with transformers backend by @ChengjieLi28 in #2695

New Contributors

@Timmy-web made their first contribution in #2681

Full Changelog: v1.1.0...v1.1.1

Contributors

qinxuye, pengjunfeng11, and 6 other contributors

Assets 2

13 Dec 10:29

XprobeBot

v1.1.0

b132fca

v1.1.0

What's new in 1.1.0 (2024-12-13)

These are the changes in inference v1.1.0.

New features

FEAT: Support F5 TTS by @codingl2k1 in #2626
FEAT: [UI] Add a hint for model running. by @yiboyasss in #2657
FEAT: support VL models for MLX by @qinxuye in #2638
FEAT: Add support for CLIP model by @Second222None in #2637
FEAT: support llama-3.3-instruct by @qinxuye in #2661

Enhancements

ENH: Optimize error message when user parameters are passed incorrectly by @namecd in #2623
ENH: bypass the sampling parameter skip_special_tokens to vLLM backend by @zjuyzj in #2655
ENH: unify prompt_text as cosyvoice for fish speech by @qinxuye in #2658
ENH: Update glm4 chat model to new weights by @codingl2k1 in #2660
ENH: upgrade sglang in Docker by @amumu96 in #2668

Bug fixes

BUG: Cleanup Isolation tasks by @codingl2k1 in #2603
BUG: fix qwq gguf download hub for modelscope by @redreamality in #2647
BUG: fix ImportError when optional dependency FlagEmbedding is not installed by @zjuyzj in #2649
BUG: use stream_generate in MLX by @qinxuye in #2635
BUG: stop parameter leads to failure with transformers backend by @ChengjieLi28 in #2663
BUG: fix FishSpeech Negative code found by @themanforfree in #2667

Documentation

DOC: update new models by @qinxuye in #2632
DOC: add doc about offline usage for SenseVoiceSmall by @qinxuye in #2654

Others

FIX: fix launching bge-m3 with hybrid mode by @pengjunfeng11 in #2641

New Contributors

@namecd made their first contribution in #2623
@redreamality made their first contribution in #2647
@Second222None made their first contribution in #2637
@themanforfree made their first contribution in #2667

Full Changelog: v1.0.1...v1.1.0

Contributors

qinxuye, redreamality, and 9 other contributors

Assets 2

29 Nov 10:22

XprobeBot

v1.0.1

8dd5715

v1.0.1

What's new in 1.0.1 (2024-11-29)

These are the changes in inference v1.0.1.

New features

FEAT: Fish speech stream by @codingl2k1 in #2562
FEAT: support sparse vector for bge-m3 by @pengjunfeng11 in #2540
FEAT: whisper support for Mac MLX by @qinxuye in #2576
FEAT: support guided decoding for vllm async engine by @wxiwnd in #2391
FEAT: support QwQ-32B-Preview by @qinxuye in #2602
FEAT: support glm-edge-chat model by @amumu96 in #2582

Enhancements

ENH: Support fish speech reference audio by @codingl2k1 in #2542

Bug fixes

BUG: GTE-qwen2 Embedding Dimension error by @cyhasuka in #2565
BUG: request_limits does not work with streaming interfaces by @ChengjieLi28 in #2571
BUG: Fix Codestral v0.1 URI for Pytorch Format by @danialcheung in #2590
BUG: Correct the input bytes data by langchain_openai #2589 by @xiyuan-lee in #2600

Documentation

DOC: update builtin models by @qinxuye in #2587

New Contributors

@pengjunfeng11 made their first contribution in #2540
@danialcheung made their first contribution in #2590
@xiyuan-lee made their first contribution in #2600

Full Changelog: v1.0.0...v1.0.1

Contributors

qinxuye, danialcheung, and 7 other contributors

Assets 2

15 Nov 10:15

XprobeBot

v1.0.0

4c96475

v1.0.0

What's new in 1.0.0 (2024-11-15)

These are the changes in inference v1.0.0.

New features

FEAT: Basic cancel support for image model by @codingl2k1 in #2528
FEAT: Add qwen2.5-coder 0.5B 1.5B 3B 14B 32B by @frostyplanet in #2543
FEAT: support kvcache in multi-round chat for MLX by @qinxuye in #2534

Enhancements

ENH: add normalize to rerank model by @hustyichi in #2509
ENH: Update fish audio by @codingl2k1 in #2555

Bug fixes

BUG: fix variant error for image model by @qinxuye in #2547

Documentation

DOC: Add paper citation by @luweizheng in #2533

Full Changelog: v0.16.3...v1.0.0

Contributors

frostyplanet, qinxuye, and 3 other contributors

Assets 2

08 Nov 05:47

XprobeBot

v0.16.3

85ab86b

v0.16.3

What's new in 0.16.3 (2024-11-08)

These are the changes in inference v0.16.3.

New features

feat: Add support for Llama 3.2-Vision models by @vikrantrathore in #2376

Enhancements

ENH: Display model name in process by @frostyplanet in #1891
REF: Remove replica total count in internal replica_model_uid by @ChengjieLi28 in #2516

Bug fixes

BUG: Compat with ChatTTS 0.2.1 by @codingl2k1 in #2520
BUG: transformers logs missing by @ChengjieLi28 in #2530

Full Changelog: v0.16.2...v0.16.3

Contributors

frostyplanet, vikrantrathore, and 2 other contributors

Assets 2

01 Nov 10:09

XprobeBot

v0.16.2

67e97ab

v0.16.2

What's new in 0.16.2 (2024-11-01)

These are the changes in inference v0.16.2.

New features

FEAT: add download from openmind_hub by @cookieyyds in #2504

Enhancements

BLD: Remove Python 3.8 & Support Python 3.12 by @ChengjieLi28 in #2503

Bug fixes

BUG: fix bge-reranker-v2-minicpm-layerwise rerank issue by @hustyichi in #2495

Documentation

DOC: modify NPU doc by @qinxuye in #2485
DOC: Add doc for ocr by @codingl2k1 in #2492

New Contributors

@hustyichi made their first contribution in #2495
@cookieyyds made their first contribution in #2504

Full Changelog: v0.16.1...v0.16.2

Contributors

qinxuye, hustyichi, and 3 other contributors

Assets 2

25 Oct 07:33

XprobeBot

v0.16.1

d4cd7b1

v0.16.1

What's new in 0.16.1 (2024-10-25)

These are the changes in inference v0.16.1.

New features

FEAT: Add support for Qwen/Qwen2.5-Coder-7B-Instruct gptq format by @frostyplanet in #2408
FEAT: Support GOT-OCR2_0 by @codingl2k1 in #2458
FEAT: [UI] Image model with the lora_config. by @yiboyasss in #2482
FEAT: added MLX support for Flux.1 by @qinxuye in #2459

Enhancements

ENH: Support ChatTTS 0.2 by @codingl2k1 in #2449
ENH: Pending queue for concurrent requests by @codingl2k1 in #2473

Bug fixes

BUG: Remove duplicated call of model_install by @frostyplanet in #2457
BUG: fix embedding model gte-Qwen2 dimensions by @JinCheng666 in #2479

Documentation

DOC: update enterprise doc links by @qinxuye in #2461

New Contributors

@JinCheng666 made their first contribution in #2479

Full Changelog: v0.16.0...v0.16.1

Contributors

frostyplanet, qinxuye, and 3 other contributors

Assets 2

18 Oct 11:40

XprobeBot

v0.16.0

5f7dea4

v0.16.0

What's new in 0.16.0 (2024-10-18)

These are the changes in inference v0.16.0.

New features

FEAT: Adding support for awq/gptq vLLM inference to VisionModel such as Qwen2-VL by @cyhasuka in #2445
FEAT: Dynamic batching for the state-of-the-art FLUX.1 text_to_image interface by @ChengjieLi28 in #2380
FEAT: added MLX for qwen2.5-instruct by @qinxuye in #2444

Enhancements

ENH: Speed up cli interaction by @frostyplanet in #2443
REF: Enable continuous batching for LLM with transformers engine by default by @ChengjieLi28 in #2437

Documentation

DOC: update readme & docs by @qinxuye in #2435

New Contributors

@cyhasuka made their first contribution in #2445

Full Changelog: v0.15.4...v0.16.0

Contributors

frostyplanet, qinxuye, and 2 other contributors

Assets 2

Releases: xorbitsai/inference

v1.2.1

What's new in 1.2.1 (2025-01-24)

New features

Enhancements

Bug fixes

Tests

Documentation

Others

New Contributors

Contributors

v1.2.0

What's new in 1.2.0 (2025-01-10)

New features

Enhancements

Bug fixes

Others

New Contributors

Contributors

v1.1.1

What's new in 1.1.1 (2024-12-27)

New features

Enhancements

Bug fixes

New Contributors

Contributors

v1.1.0

What's new in 1.1.0 (2024-12-13)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

v1.0.1

What's new in 1.0.1 (2024-11-29)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v1.0.0

What's new in 1.0.0 (2024-11-15)

New features

Enhancements

Bug fixes

Documentation

Contributors

v0.16.3

What's new in 0.16.3 (2024-11-08)

New features

Enhancements

Bug fixes

Contributors

v0.16.2

What's new in 0.16.2 (2024-11-01)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v0.16.1

What's new in 0.16.1 (2024-10-25)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v0.16.0

What's new in 0.16.0 (2024-10-18)

New features

Enhancements

Documentation

New Contributors

Contributors