Skip to content

ModelOpt 0.25.0 Release

Latest
Compare
Choose a tag to compare
@kevalmorabia97 kevalmorabia97 released this 03 Mar 17:41

Backward Breaking Changes

  • Deprecate Torch 2.1 support.
  • Deprecate humaneval benchmark in llm_eval examples. Please use the newly added simple_eval instead.
  • Deprecate fp8_naive quantization format in llm_ptq examples. Please use fp8 instead.

New Features

  • Support fast hadamard transform in TensorQuantizer class (modelopt.torch.quantization.nn.modules.TensorQuantizer).
    It can be used for rotation based quantization methods, e.g. QuaRot. Users need to install the package fast_hadamard_transfrom to use this feature.
  • Add affine quantization support for the KV cache, resolving the low accuracy issue in models such as Qwen2.5 and Phi-3/3.5.
  • Add FSDP2 support. FSDP2 can now be used for QAT.
  • Add LiveCodeBench and Simple Evals to the llm_eval examples.
  • Disabled saving modelopt state in unified hf export APIs by default, i.e., added save_modelopt_state flag in export_hf_checkpoint API and by default set to False.
  • Add FP8 and NVFP4 real quantization support with LLM QLoRA example.
  • The modelopt.deploy.llm.LLM class now support use the tensorrt_llm._torch.LLM backend for the quantized HuggingFace checkpoints.
  • Add NVFP4 PTQ example for DeepSeek-R1.
  • Add end-to-end AutoDeploy example for AutoQuant LLM models.