Skip to content

Tencent/AngelSlim

Repository files navigation

简体中文 | English

AngelSlim

致力于打造更易用、更全面和更高效的大模型压缩工具包

📖 Documentation   |   🤗 Hugging Face   |   🤖 ModelScope   |   💬 WeChat (微信) |   🫨 Discord

目录

📣最新进展

  • [25/07/04] 我们支持了Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen等模型的量化,包含INT8、FP8、INT4等算法。 我们还开源了Qwen3系列模型的Eagle3权重。

Coming soon:

  • DeepSeek-R1的W4A8量化支持
  • 多模态Qwen-VL模型的量化支持
  • 投机采样新算法发布

🌟主要特性

  • 高度集成化:本工具将主流的压缩算法集成到工具,开发者可一键式调用,具有很好的易用性。
  • 持续算法创新:本工具除了集成工业界使用最广的算法,还持续自研更好的压缩算法,并且会陆续开源。
  • 追求极致性能:在模型压缩流程、压缩算法部署方面,本工具持续端到端优化,例如单卡GPU可量化Qwen3-235B和Deepseek-R1。

💼支持模型

量化

目前已支持文生文任务Hunyuan-Dense、Hunyuan-MoE、Qwen3-Dense、Qwen3-MoE、Qwen2.5、DeepSeek-R1蒸馏Qwen模型、QwQ等系列的主要模型:

模型名 FP8-Dynamic FP8-Static INT8-Dynamic INT4-GPTQ INT4-AWQ
Hunyuan-Dense [ ]
Hunyuan-MoE [ ]
Qwen3-Dense
Qwen3-MoE
Qwen2.5
DeepSeek-R1-Distill-Qwen
QwQ

投机采样

目前已开源Qwen3系列模型的Eagle3权重。

模型名 Eagle3
Qwen3-1.7B
Qwen3-4B
Qwen3-8B
Qwen3-14B
Qwen3-32B
Qwen3-30B-A3B

🛎️如何使用

安装 AngelSlim

推荐使用pip直接安装最新稳定版AngelSlim

pip install angelslim

也可以选择克隆代码仓库后,以可编辑的方式从源代码安装:

cd AngelSlim && python setup.py install

更详细的安装说明可参考安装文档

快速开始

完成安装AngelSlim后,您可以通过以下脚本快速开始,完成Qwen3-1.7B模型的静态FP8量化:

  • 一键式启动

    python3 tools/run.py -c configs/qwen3/fp8_static/qwen3-1_7b_fp8_static.yaml

    该示例将会加载HugggingFace模型, 使用config配置的dataset数据进行激活值校准,量化产出模型权重.

  • 源码启动

    Qwen3-1.7B完成动态FP8量化:

    from angelslim.engine import Engine
    
    slim_engine = Engine()
    # Prepare model
    slim_engine.prepare_model(model_name="Qwen", model_path="Qwen/Qwen3-1.7B",)
    # Initialize compressor
    slim_engine.prepare_compressor("PTQ", default_method="fp8_dynamic")
    # Compress model
    slim_engine.run()
    # Save compressed model
    slim_engine.save("./output")

详情请参考快速开始文档

部署与测试

1. 服务部署

指定量化模型路径 MODEL_PATH 后,支持通过以下推理框架部署 OpenAI 兼容的 API 服务:

vLLM

vLLM 服务启动脚本,建议版本vllm>=0.8.5.post1,部署MOE INT8量化模型需要vllm>=0.9.0

bash deploy/run_vllm.sh $MODEL_PATH

SGLang

SGLang 服务启动脚本,建议版本 sglang>=0.4.6.post1

bash deploy/run_sglang.sh $MODEL_PATH

2. 服务调用

通过 OpenAI 格式 接口发起请求:

bash deploy/openai.sh $MODEL_PATH

3. 效果验证

使用 lm-evaluation-harness 评估量化模型精度,建议版本lm-eval>=0.4.8

bash deploy/lm_eval.sh $MODEL_PATH

详细操作指南请参阅部署文档

📈Benchmark

(1)量化

下面只展示了部分模型的效果测试情况,完整Benchmark可以参考Benchmark文档

Hunyuan系列模型

Hunyuan-A13B-Instruct的BF16FP8INT4-GPTQAIME 2024GSM8KBBHDROP上的评测结果如下:

Bench Hunyuan-A13B-Instruct Hunyuan-A13B-Instruct-FP8 Hunyuan-A13B-Instruct-Int4-GPTQ
AIME 2024 87.30 86.70 86.70
GSM8K 94.39 94.01 94.24
BBH 89.10 88.34 87.91
DROP 91.10 91.10 91.05

Qwen3系列模型

Qwen3系列模型的BF16FP8-StaticFP8-DynamicINT8-DynamicINT4-GPTQINT4-AWQCEVALMMLUGSM8KHUMANEVAL上的评测结果如下:

ModelQuantizationCEVALMMLUGSM8KHUMANEVAL
Qwen3-0.6BBF1645.8447.2142.9919.51
FP8-Static45.9946.8738.0618.90
FP8-Dynamic45.9946.9338.2920.73
INT8-Dynamic45.1746.9541.1721.34
Qwen3-8BBF1679.2774.7887.7963.41
FP8-Static78.2374.7986.9662.20
FP8-Dynamic78.4574.7587.6462.80
INT8-Dynamic78.0174.8486.9667.07
INT4-GPTQ77.1973.2686.4362.20
INT4-AWQ76.1573.5986.9663.41
Qwen3-14BBF1683.0678.9088.4055.49
FP8-Static82.6278.5789.4657.32
FP8-Dynamic82.2478.9288.3252.44
INT8-Dynamic81.8778.1386.2856.10
INT4-GPTQ81.0578.0287.3457.93
INT4-AWQ82.0277.6884.2361.59
Qwen3-32BBF1686.5582.0074.5337.80
FP8-Static86.9281.7870.2039.63
FP8-Dynamic86.5581.8970.4338.41
INT4-GPTQ86.1881.01-43.29
INT4-AWQ86.1881.54-36.59
Qwen3-30B-A3BBF1683.6679.3689.9931.71
FP8-Static83.9579.4789.0131.10
FP8-Dynamic84.1079.4089.1632.93
INT8-Dynamic83.3679.4889.1634.15
Qwen3-235B-A22BBF1689.6086.2885.2927.44
FP8-Static89.6786.1986.9627.44
FP8-Dynamic89.6786.1885.2228.05
INT8-Dynamic88.9386.2086.2023.78
QwQ-32BBF1685.7482.0373.3142.68
FP8-Static85.4481.9175.3642.68
FP8-Dynamic85.0781.9375.6642.07
INT4-GPTQ84.0381.2668.2345.73
INT4-AWQ83.5881.0168.6943.29

其他模型

其他模型的BF16FP8-StaticFP8-DynamicINT4-GPTQINT4-AWQCEVALMMLUGSM8K上的评测结果如下:

ModelQuantizationCEVALMMLUGSM8K
Qwen2.5-1.5B-InstructBF1667.0160.0554.28
FP8-Static66.2760.23-
FP8-Dynamic66.7960.0851.71
Qwen2.5-7B-InstructBF1681.2074.5579.98
FP8-Static81.1374.0379.30
FP8-Dynamic80.3174.0779.00
INT4-GPTQ79.0573.0574.75
INT4-AWQ79.3573.2279.38
Qwen2.5-32B-InstructBF1687.3083.2181.73
FP8-Static87.5983.0881.58
FP8-Dynamic87.3083.0481.58
INT4-GPTQ86.7082.4582.03
INT4-AWQ87.0082.64-
DeepSeek-R1-Distill-Qwen-7BBF1653.4953.8075.74
FP8-Static53.5754.1776.19
FP8-Dynamic52.9754.1374.15
INT4-GPTQ51.8652.4475.89
INT4-AWQ53.4953.70-
DeepSeek-R1-Distill-Qwen-14BBF1677.7174.2885.67
FP8-Static77.5674.6686.73
FP8-Dynamic76.8274.6387.11
INT4-GPTQ74.2972.3784.61
INT4-AWQ74.8173.0086.05
DeepSeek-R1-Distill-Qwen-32BBF1684.1880.8987.41
FP8-Static83.4380.9087.57
FP8-Dynamic83.7381.1086.43
INT4-GPTQ84.1079.8086.73
INT4-AWQ82.8480.1587.19

(2)投机采样

Qwen3系列的Eagle3模型在MT-bench/HunmanEval/GSM8K/Alpaca上的加速结果如下:

   MT-bench HumanEval GSM8K Alpaca Mean
TemperatureModelSpeedupτSpeedupτSpeedupτSpeedupτSpeedupτ
T=0 Qwen3-1.7B2.05x2.812.07x2.932.11x2.981.93x2.692.04x2.85
Qwen3-4B2.21x3.012.36x3.242.42x3.132.32x2.752.33x3.03
Qwen3-8B2.63x3.652.76x3.852.82x3.902.62x3.482.70x3.72
Qwen3-14B2.23x3.302.53x3.742.56x3.792.16x3.132.37x3.49
Qwen3-32B2.39x2.782.37x2.812.47x2.922.42x2.532.41x2.76
Qwen3-30B-A3B2.84x3.632.27x3.092.64x3.422.83x3.562.64x3.42
T=1 Qwen3-1.7B1.74x2.531.86x2.701.82x2.691.72x2.461.93x2.60
Qwen3-4B1.93x2.602.00x2.842.11x2.822.34x2.501.75x2.69
Qwen3-8B1.98x2.752.25x3.112.31x3.152.10x2.762.90x2.94
Qwen3-14B1.71x2.611.95x2.872.04x3.081.68x2.552.90x2.78
Qwen3-32B1.62x1.911.71x2.051.78x2.101.80x1.951.62x2.00
Qwen3-30B-A3B1.91x2.462.00x2.641.90x2.531.80x2.321.90x2.48

📝许可协议

本项目的代码依照 License for AngelSlim 协议开源。

🔗引用

@software{AngelSlim2025,
    title={{AngelSlim}},
    author={Tencent AngelSlim Project Contributors},
    year={2025},
    month={7},
    url={https://github.com/Tencent/AngelSlim},
}

💬技术交流

  • AngelSlim正在快速迭代更新中,后续会推出更多的功能,有问题或建议欢迎通过GitHub Issues给我们提issue,或者加入微信技术交流群

About

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5