Releases · InternLM/lmdeploy

14 Apr 10:04

lvhan028

v0.7.3

231a323

v0.7.3

What's Changed

🚀 Features

Add Qwen3 and Qwen3MoE by @lzhangzz in #3305
[Feature] support qwen3 and qwen3-moe for pytorch engine by @CUHKSZzxy in #3315
[ascend]support deepseekv2 by @yao-fengchen in #3206
support ascend w8a8 graph_mode by @yao-fengchen in #3267
support Llama4 by @grimoire in #3408

💥 Improvements

Add spaces_between_special_tokens to /v1/interactive and make compatible with empty text by @AllentDan in #3283
add env var to control timeout by @CUHKSZzxy in #3291
optimize mla, remove load v by @grimoire in #3334
refactor dlinfer rope by @yao-fengchen in #3326
enable qwenvl2.5 graph mode on ascend by @jinminxi104 in #3367
Optimize ascend moe by @yao-fengchen in #3364
find port by @grimoire in #3429

🐞 Bug fixes

fix activation grid oversize by @grimoire in #3282
Set ensure_ascii=False for tool calling by @AllentDan in #3295
add v check by @grimoire in #3307
Fix Qwen3MoE config parsing by @lzhangzz in #3336
Fix finish reasons by @AllentDan in #3338
remove think_end_token_id in streaming content by @AllentDan in #3327
Fix the finish_reason by @AllentDan in #3350
support List[dict] prompt input without do_preprocess by @irexyc in #3385
fix tensor dispatch in dynamo by @wanfengcxz in #3417

📚 Documentations

update ascend doc by @yao-fengchen in #3420

🌐 Other

bump version to v0.7.2.post1 by @lvhan028 in #3298
Optimize internvit by @caikun-pjlab in #3316
bump version to v0.7.3 by @lvhan028 in #3416

New Contributors

@wanfengcxz made their first contribution in #3417
@caikun-pjlab made their first contribution in #3316

Full Changelog: v0.7.2...v0.7.3

Contributors

grimoire, lvhan028, and 8 other contributors

Assets 12

21 Mar 06:38

lvhan028

v0.7.2.post1

81c815e

v0.7.2.post1

What's Changed

💥 Improvements

Add spaces_between_special_tokens to /v1/interactive and make compatible with empty text by @AllentDan in #3283
add env var to control timeout by @CUHKSZzxy in #3291

🐞 Bug fixes

fix activation grid oversize by @grimoire in #3282
Set ensure_ascii=False for tool calling by @AllentDan in #3295

🌐 Other

bump version to v0.7.2.post1 by @lvhan028 in #3298

Full Changelog: v0.7.2...v0.7.2.post1

Contributors

grimoire, lvhan028, and 2 other contributors

Assets 12

19 Mar 08:36

lvhan028

v0.7.2

6f1277e

v0.7.2

What's Changed

🚀 Features

[Feature] support qwen2.5-vl for pytorch engine by @CUHKSZzxy in #3194
Support reward models by @lvhan028 in #3192
Add collective communication kernels by @lzhangzz in #3163
PytorchEngine multi-node support v2 by @grimoire in #3147
Add flash mla by @AllentDan in #3218
Add gemma3 implementation by @AllentDan in #3272

💥 Improvements

remove update badwords by @grimoire in #3183
defaullt executor ray by @grimoire in #3210
change ascend&camb default_batch_size to 256 by @jinminxi104 in #3251
Tool reasoning parsers and streaming function call by @AllentDan in #3198
remove torchelastic flag by @grimoire in #3242
disable flashmla warning on sm<90 by @grimoire in #3271

🐞 Bug fixes

Fix missing cli chat option by @lzhangzz in #3209
[ascend] fix multi-card distributed inference failures by @tangzhiyi11 in #3215
fix for small cache-max-entry-count by @grimoire in #3221
[dlinfer] fix glm-4v graph mode on ascend by @jinminxi104 in #3235
fix qwen2.5 pytorch engine dtype error on NPU by @tcye in #3247
[Fix] failed to update the tokenizer's eos_token_id into stop_word list by @lvhan028 in #3257
fix dsv3 gate scaling by @grimoire in #3263
Fix the bug for reading dict error by @GxjGit in #3196
Fix get ppl by @lvhan028 in #3268

📚 Documentations

Specifiy lmdeploy version in benchmark guide by @lyj0309 in #3216
[ascend] add Ascend docker image by @jinminxi104 in #3239

🌐 Other

[ci] testcase refactoring by @zhulinJulia24 in #3151
[ci] add testcase for native communicator by @zhulinJulia24 in #3217
[ci] add volc evaluation testcase by @zhulinJulia24 in #3240
[ci] remove v100 testconfig by @zhulinJulia24 in #3253
add rdma dependencies into docker file by @CUHKSZzxy in #3262
docs: update ascend docs for docker running by @CyCle1024 in #3266
bump version to v0.7.2 by @lvhan028 in #3252

New Contributors

@lyj0309 made their first contribution in #3216
@tcye made their first contribution in #3247

Full Changelog: v0.7.1...v0.7.2

Contributors

grimoire, lvhan028, and 10 other contributors

Assets 12

27 Feb 02:19

lvhan028

v0.7.1

c4d5bd9

v0.7.1

What's Changed

🚀 Features

support release pipeline by @irexyc in #3069
[feature] add dlinfer w8a8 support. by @Reinerzhou in #2988
[maca] support deepseekv2 for maca backend. by @Reinerzhou in #2918
[Feature] support deepseek-vl2 for pytorch engine by @CUHKSZzxy in #3149

💥 Improvements

use weights iterator while loading by @RunningLeon in #2886
Add deepseek-r1 chat template by @AllentDan in #3072
Update tokenizer by @lvhan028 in #3061
Set max concurrent requests by @AllentDan in #2961
remove logitswarper by @grimoire in #3109
Update benchmark script and user guide by @lvhan028 in #3110
support eos_token list in turbomind by @irexyc in #3044
Use aiohttp inside proxy server && add --disable-cache-status argument by @AllentDan in #3020
Update runtime package dependencies by @zgjja in #3142
Make turbomind support embedding inputs on GPU by @chengyuma in #3177

🐞 Bug fixes

[dlinfer] fix ascend qwen2_vl graph_mode by @yao-fengchen in #3045
fix error in interactive api by @lvhan028 in #3074
fix sliding window mgr by @grimoire in #3068
More arguments in api_client, update docstrings by @AllentDan in #3077
Add system role to deepseek chat template by @AllentDan in #3031
Fix xcomposer2d5 by @irexyc in #3087
fix user guide about cogvlm deployment by @lvhan028 in #3088
fix postional argument by @lvhan028 in #3086
Fix UT of deepseek chat template by @lvhan028 in #3125
Fix internvl2.5 error after eviction by @grimoire in #3122
Fix cogvlm and phi3vision by @RunningLeon in #3137
[fix] fix vl gradio, use pipeline api and remove interactive chat by @irexyc in #3136
fix the issue that stop_token may be less than defined in model.py by @irexyc in #3148
fix typing by @lz1998 in #3153
fix min length penalty by @irexyc in #3150
fix default temperature value by @irexyc in #3166
Use pad_token_id as image_token_id for vl models by @RunningLeon in #3158
Fix tool call prompt for InternLM and Qwen by @AllentDan in #3156
Update qwen2.py by @GxjGit in #3174
fix temperature=0 by @grimoire in #3176
fix blocked fp8 moe by @grimoire in #3181
fix deepseekv2 has no attribute use_mla error by @CUHKSZzxy in #3188
fix unstoppable chat by @lvhan028 in #3189

🌐 Other

[ci] add internlm3 into testcase by @zhulinJulia24 in #3038
add internlm3 to supported models by @lvhan028 in #3041
update pre-commit config by @lvhan028 in #2683
[maca] add cudagraph support on maca backend. by @Reinerzhou in #2834
bump version to v0.7.0.post1 by @lvhan028 in #3076
bump version to v0.7.0.post2 by @lvhan028 in #3094
[Fix] fix the URL judgment problem in Windows by @Lychee-acaca in #3103
bump version to v0.7.0.post3 by @lvhan028 in #3115
[ci] fix some fail in daily testcase by @zhulinJulia24 in #3134
Bump version to v0.7.1 by @lvhan028 in #3178

New Contributors

@Lychee-acaca made their first contribution in #3103
@lz1998 made their first contribution in #3153
@GxjGit made their first contribution in #3174
@chengyuma made their first contribution in #3177
@CUHKSZzxy made their first contribution in #3149

Full Changelog: v0.7.0...v0.7.1

Contributors

grimoire, lvhan028, and 12 other contributors

Assets 12

10 Feb 06:00

lvhan028

v0.7.0.post3

e98fd6a

v0.7.0.post3

What's Changed

💥 Improvements

Set max concurrent requests by @AllentDan in #2961
remove logitswarper by @grimoire in #3109

🐞 Bug fixes

fix user guide about cogvlm deployment by @lvhan028 in #3088
fix postional argument by @lvhan028 in #3086

🌐 Other

[Fix] fix the URL judgment problem in Windows by @Lychee-acaca in #3103
bump version to v0.7.0.post3 by @lvhan028 in #3115

New Contributors

@Lychee-acaca made their first contribution in #3103

Full Changelog: v0.7.0.post2...v0.7.0.post3

Contributors

grimoire, lvhan028, and 2 other contributors

Assets 12

27 Jan 15:57

lvhan028

v0.7.0.post2

637435f

LMDeploy Release V0.7.0.post2

What's Changed

💥 Improvements

Add deepseek-r1 chat template by @AllentDan in #3072
Update tokenizer by @lvhan028 in #3061

🐞 Bug fixes

Add system role to deepseek chat template by @AllentDan in #3031
Fix xcomposer2d5 by @irexyc in #3087

🌐 Other

bump version to v0.7.0.post2 by @lvhan028 in #3094

Full Changelog: v0.7.0.post1...v0.7.0.post2

Contributors

lvhan028, irexyc, and AllentDan

Assets 12

25 Jan 11:35

lvhan028

v0.7.0.post1

552bf3a

LMDeploy Release V0.7.0.post1

What's Changed

💥 Improvements

use weights iterator while loading by @RunningLeon in #2886

🐞 Bug fixes

[dlinfer] fix ascend qwen2_vl graph_mode by @yao-fengchen in #3045
fix error in interactive api by @lvhan028 in #3074
fix sliding window mgr by @grimoire in #3068
More arguments in api_client, update docstrings by @AllentDan in #3077

🌐 Other

[ci] add internlm3 into testcase by @zhulinJulia24 in #3038
add internlm3 to supported models by @lvhan028 in #3041
update pre-commit config by @lvhan028 in #2683
[maca] add cudagraph support on maca backend. by @Reinerzhou in #2834
bump version to v0.7.0.post1 by @lvhan028 in #3076

Full Changelog: v0.7.0...v0.7.0.post1

Contributors

grimoire, lvhan028, and 5 other contributors

Assets 12

15 Jan 10:04

lvhan028

v0.7.0

9fcb3b1

LMDeploy Release v0.7.0

What's Changed

🚀 Features

Support moe w8a8 in pytorch engine by @grimoire in #2894
Support DeepseekV3 fp8 by @grimoire in #2967
support new backend cambricon by @JackWeiw in #3002
support-moe-fp8 by @RunningLeon in #3007
add internlm3-dense(turbomind) & chat template by @irexyc in #3024
support internlm3 on pt by @RunningLeon in #3026
Support internlm3 quantization by @AllentDan in #3027

💥 Improvements

Optimize awq kernel in pytorch engine by @grimoire in #2965
Support fp8 w8a8 for pt backend by @RunningLeon in #2959
Optimize lora kernel by @grimoire in #2975
Remove threadsafe by @grimoire in #2907
Refactor async engine & turbomind IO by @lzhangzz in #2968
[dlinfer]rope refine by @JackWeiw in #2984
Expose spaces_between_special_tokens by @AllentDan in #2991
[dlinfer]change llm op interface of paged_prefill_attention. by @JackWeiw in #2977
Update request logger by @lvhan028 in #2981
remove decoding by @grimoire in #3016

🐞 Bug fixes

Fix build crash in nvcr.io/nvidia/pytorch:24.06-py3 image by @zgjja in #2964
add tool role in BaseChatTemplate as tool response in messages by @AllentDan in #2979
Fix ascend dockerfile by @jinminxi104 in #2989
fix internvl2 qk norm by @grimoire in #2987
fix xcomposer2 when transformers is upgraded greater than 4.46 by @irexyc in #3001
Fix get_ppl & get_logits by @lvhan028 in #3008
Fix typo in w4a16 guide by @Yan-Xiangjun in #3018
fix blocked fp8 moe kernel by @grimoire in #3009
Fix async engine by @lzhangzz in #3029
[hotfix] Fix get_ppl by @lvhan028 in #3023
Fix MoE gating for DeepSeek V2 by @lzhangzz in #3030
Fix empty response for pipeline by @lzhangzz in #3034
Fix potential hang during TP model initialization by @lzhangzz in #3033

🌐 Other

[ci] add w8a8 and internvl2.5 models into testcase by @zhulinJulia24 in #2949
bump version to v0.7.0 by @lvhan028 in #3010

New Contributors

@zgjja made their first contribution in #2964
@Yan-Xiangjun made their first contribution in #3018

Full Changelog: 0.6.5...v0.7.0

Contributors

grimoire, lvhan028, and 9 other contributors

Assets 12

30 Dec 10:15

lvhan028

0.6.5

af0fcf2

LMDeploy Release v0.6.5

What's Changed

🚀 Features

[dlinfer] feat: add DlinferFlashAttention to support qwen vl. by @Reinerzhou in #2952

💥 Improvements

refactor PyTorchEngine check env by @grimoire in #2870
refine multi-backend setup.py by @jinminxi104 in #2880
Refactor VLM modules by @lvhan028 in #2810
[dlinfer] only compile the language model in vl models by @tangzhiyi11 in #2893
Optimize tp broadcast by @grimoire in #2889
unfeeze torch version in dockerfile by @RunningLeon in #2906
support tp > n_kv_heads for pt engine by @RunningLeon in #2872
replicate kv for some models when tp is divisble by kv_head_num by @irexyc in #2874
Fallback to pytorch engine when the model is quantized by smooth quant by @lvhan028 in #2953
Torchrun launching multiple api_server by @AllentDan in #2402

🐞 Bug fixes

[Feature] Support for loading lora adapter weights in safetensors format by @Galaxy-Husky in #2860
fix cpu cache by @grimoire in #2881
Fix args type in docstring by @Galaxy-Husky in #2888
Fix llama3.1 chat template by @fzyzcjy in #2862
Fix typo by @ghntd in #2916
fix: Incorrect stats size during inference of throughput benchmark when concurrency > num_prompts by @pancak3 in #2928
fix lora name and rearange wqkv for internlm2 by @RunningLeon in #2912
[dlinfer] fix moe op for dlinfer. by @Reinerzhou in #2917
[side effect] fix vlm quant failed by @lvhan028 in #2914
fix torch_dtype by @RunningLeon in #2933
support unaligned qkv heads by @grimoire in #2930
fix mllama inference without image by @RunningLeon in #2947
Support torch_dtype modification and update FAQs for AWQ quantization by @AllentDan in #2898
Fix exception handler for proxy server by @AllentDan in #2901
Fix torch_dtype in lite by @AllentDan in #2956
[side-effect] bring back quantization of qwen2-vl, glm4v and etc. by @lvhan028 in #2954
add a thread pool executor to control the vl engine traffic by @lvhan028 in #2970
[side-effect] fix gradio demo error by @lvhan028 in #2976

🌐 Other

[dlinfer] fix engine checker by @tangzhiyi11 in #2891
Bump version to v0.6.5 by @lvhan028 in #2955

New Contributors

@Galaxy-Husky made their first contribution in #2860
@fzyzcjy made their first contribution in #2862
@ghntd made their first contribution in #2916
@pancak3 made their first contribution in #2928

Full Changelog: v0.6.4...0.6.5

Contributors

grimoire, lvhan028, and 10 other contributors

Assets 12

09 Dec 12:08

lvhan028

v0.6.4

14b64c7

LMDeploy Release v0.6.4

What's Changed

🚀 Features

feature: support qwen2.5 fuction_call by @akai-shuuichi in #2737
[Feature] support minicpm-v_2_6 for pytorch engine. by @Reinerzhou in #2767
Support qwen2-vl AWQ quantization by @AllentDan in #2787
Add DeepSeek-V2 support by @lzhangzz in #2763
[ascend]feat: support kv int8 by @yao-fengchen in #2736

💥 Improvements

Optimize update_step_ctx on Ascend by @jinminxi104 in #2804
Add Ascend installation adapter by @zhabuye in #2817
Refactor turbomind (2/N) by @lzhangzz in #2818
add openssh-server installation in dockerfile by @lvhan028 in #2830
Add version restrictions in runtime_ascend.txt to ensure functionality by @zhabuye in #2836
better kv allocate by @grimoire in #2814
Update internvl chat template by @AllentDan in #2832
profile throughput without new threads by @grimoire in #2826
[dlinfer] change dlinfer kv_cache layout and ajust paged_prefill_attention api. by @Reinerzhou in #2847
[maca] add env to support different mm layout on maca. by @Reinerzhou in #2835
Supports W8A8 quantization for more models by @AllentDan in #2850

🐞 Bug fixes

disable prefix-caching for vl model by @grimoire in #2825
Fix gemma2 accuracy through the correct softcapping logic by @AllentDan in #2842
fix accessing before initialization by @lvhan028 in #2845
fix the logic to verify whether AutoAWQ has been successfully installed by @grimoire in #2844
check whether backend_config is None or not before accessing its attr by @lvhan028 in #2848
[ascend] convert kv cache to nd format in ascend graph mode by @tangzhiyi11 in #2853

📚 Documentations

Update supported models & Ascend doc by @jinminxi104 in #2765
update supported models by @lvhan028 in #2849

🌐 Other

[CI] Split vl testcases into turbomind and pytorch backend by @zhulinJulia24 in #2751
[dlinfer] Fix qwenvl rope error for dlinfer backend by @JackWeiw in #2795
[CI] add more testcase for mllm models by @zhulinJulia24 in #2791
Update dlinfer-ascend version in runtime_ascend.txt by @jinminxi104 in #2865
bump version to v0.6.4 by @lvhan028 in #2864

New Contributors

@akai-shuuichi made their first contribution in #2737
@JackWeiw made their first contribution in #2795
@zhabuye made their first contribution in #2817

Full Changelog: v0.6.3...v0.6.4

Contributors

grimoire, lvhan028, and 10 other contributors

Assets 12

Releases: InternLM/lmdeploy

v0.7.3

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!

v0.7.2.post1

What's Changed

💥 Improvements

🐞 Bug fixes

🌐 Other

Contributors

Uh oh!

v0.7.2

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!

v0.7.1

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

Uh oh!

v0.7.0.post3

What's Changed

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

Uh oh!

LMDeploy Release V0.7.0.post2

What's Changed

💥 Improvements

🐞 Bug fixes

🌐 Other

Contributors

Uh oh!

LMDeploy Release V0.7.0.post1

What's Changed

💥 Improvements

🐞 Bug fixes

🌐 Other

Contributors

Uh oh!

LMDeploy Release v0.7.0

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

Uh oh!

LMDeploy Release v0.6.5

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

Uh oh!

LMDeploy Release v0.6.4

What's Changed

🚀 Features