int4 quantization output onnx does not load #156

thejaswi01 · 2025-03-13T04:50:47Z

Describe the bug

Running int4 quantization on a onnx file with conformer architecture outputs an error at the end of quantization and output onnx does not load

Steps/Code to reproduce bug

Run quantization with below script

from modelopt.onnx.quantization import quantize
import os

def run_quantization():
    input_path = 'model.onnx'
    output_path = 'model_int4.onnx'

    # Create output directory if it doesn't exist
    os.makedirs(os.path.dirname(output_path), exist_ok=True)

    quantize(
        input_path,
        quantize_mode='int4',
        use_external_data_format=True,
        output_path=output_path,
        verbose=True,
    )

if __name__ == "__main__":
    run_quantization()

Error

INFO:root:Quantized onnx model is saved as xxx
WARNING:root:ONNX model checker failed, check your deployment status.
WARNING:root:Unrecognized attribute: block_size for operator DequantizeLinear

==> Context: Bad node spec for node. Name: onnx::MatMul_4725_DequantizeLinear OpType: DequantizeLinear

Expected behavior

Output model should have valid onnx and should load/run as expected

System information

Container used (if applicable): ?
OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): Ubuntu 22.04.5 LTS
CPU architecture (x86_64, aarch64): x86_64
GPU name (e.g. H100, A100, L40S): NVIDIA A10G
GPU memory size: 22.5 GB
Number of GPUs: 1
- Library versions (if applicable):
- Python: 3.10.12
- ModelOpt version or commit hash: 0.25.0
- CUDA: 12.8
- PyTorch: 2.5.1+cu124
- Transformers: 4.47.1
  [TensorRT-LLM] TensorRT-LLM version: 0.17.0.post1
- TensorRT-LLM: 0.17.0.post1
- ONNXRuntime: 1.20.1
- TensorRT: 10.8.0.43

The text was updated successfully, but these errors were encountered:

i-riyad · 2025-04-21T03:10:29Z

The warning is not critical. The quantized model should compile and run successfully on both the TensorRT and DML backends. I have tested with TensorRT 10.8 and 10.9, and was able to generate the engine successfully from the quantized ONNX model.
If you’re encountering any issues during deployment, please share the specific error messages so we can help troubleshoot further.

thejaswi01 added the bug Something isn't working label Mar 13, 2025

thejaswi01 changed the title ~~int4 quantization outputs buggy onnx~~ int4 quantization output onnx does not load Mar 13, 2025

i-riyad self-assigned this Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int4 quantization output onnx does not load #156

int4 quantization output onnx does not load #156

thejaswi01 commented Mar 13, 2025

i-riyad commented Apr 21, 2025 •

edited

Loading

int4 quantization output onnx does not load #156

int4 quantization output onnx does not load #156

Comments

thejaswi01 commented Mar 13, 2025

Describe the bug

Steps/Code to reproduce bug

Expected behavior

System information

i-riyad commented Apr 21, 2025 • edited Loading

i-riyad commented Apr 21, 2025 •

edited

Loading