bump version to v0.11.1 (#4221)

lvhan028 · web-flow · commit cdff7691d134 · 2025-12-24T21:26:15.000+08:00
* bump version to v0.11.1

* minor fix
diff --git a/README.md b/README.md
@@ -216,7 +216,7 @@ The default prebuilt package is compiled on **CUDA 12** since v0.3.0.
 For the GeForce RTX 50 series, please install the LMDeploy prebuilt package complied with **CUDA 12.8**
 
 ```shell
-export LMDEPLOY_VERSION=0.11.0
+export LMDEPLOY_VERSION=0.11.1
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
 ```
diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -217,7 +217,7 @@ pip install lmdeploy
 若使用 GeForce RTX 50 系列显卡，请安装基于 **CUDA 12.8** 编译的 LMDeploy 预编译包。
 
 ```shell
-export LMDEPLOY_VERSION=0.11.0
+export LMDEPLOY_VERSION=0.11.1
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
 ```
diff --git a/docs/en/get_started/installation.md b/docs/en/get_started/installation.md
@@ -23,7 +23,7 @@ pip install lmdeploy
 The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:
 
 ```shell
-export LMDEPLOY_VERSION=0.11.0
+export LMDEPLOY_VERSION=0.11.1
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```
diff --git a/docs/zh_cn/get_started/installation.md b/docs/zh_cn/get_started/installation.md
@@ -23,7 +23,7 @@ pip install lmdeploy
 默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3)，你可以使用以下命令安装 lmdeploy：
 
 ```shell
-export LMDEPLOY_VERSION=0.11.0
+export LMDEPLOY_VERSION=0.11.1
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```
diff --git a/lmdeploy/cli/chat.py b/lmdeploy/cli/chat.py
@@ -32,6 +32,10 @@ def build_pipe(model_path, backend, **kwargs):
             from .utils import get_lora_adapters
             adapters = get_lora_adapters(kwargs['adapters'])
             engine_config.adapters = adapters
+    # disable metrics to avoid installing prometheus_client, which is not needed
+    # in interactive chat
+    engine_config.enable_metrics = False
+
     # set chat template config
     chat_template = kwargs.get('chat_template', None)
     chat_template_config = None
diff --git a/lmdeploy/pytorch/backends/cuda/blockedf8_modules.py b/lmdeploy/pytorch/backends/cuda/blockedf8_modules.py
@@ -41,12 +41,7 @@ def forward(self,
                                              trans_scale=True,
                                              scale_fmt=self.scale_fmt)
 
-        out = blocked_gemm_fp8(input_quant,
-                               input_scale,
-                               weight.t(),
-                               scale.t(),
-                               out_dtype=x.dtype,
-                               scale_fmt=self.scale_fmt)
+        out = blocked_gemm_fp8(input_quant, input_scale, weight.t(), scale.t(), out_dtype=x.dtype)
         if bias is not None:
             out += bias
 
diff --git a/lmdeploy/version.py b/lmdeploy/version.py
@@ -1,7 +1,7 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 from typing import Tuple
 
-__version__ = '0.11.0'
+__version__ = '0.11.1'
 short_version = __version__