-
Notifications
You must be signed in to change notification settings - Fork 82
Issues: NVIDIA/TensorRT-Model-Optimizer
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Bugs in the speculative decoding example
bug
Something isn't working
#211
opened Jun 11, 2025 by
Framartin
Explicit quantization in PyTorch before ONNX leads to slower TRT engine than ONNX PTQ
bug
Something isn't working
#207
opened Jun 8, 2025 by
liukang1811
Apart from fp8, are there any plans to directly support vllm inference with awq in the future
feature request
New feature or request
#204
opened May 28, 2025 by
dingjingzhen
get_modelike_from_algo_cfg
doesn't accept QuantizeAlgorithmConfig
, but it's typing says it should
bug
#201
opened May 21, 2025 by
ORippler
[BUG] FP8 real_quantization doesnt work with block_sizes
bug
Something isn't working
#193
opened May 9, 2025 by
ishan-modi
is fp8 quantization with block-wise/per-token/per-channel supported
feature request
New feature or request
#192
opened May 9, 2025 by
YSF-A
What is the difference of config in mtq.quantize() and config in TensorQuantizer
#190
opened May 6, 2025 by
YSF-A
Support for W4A16 and W4A8 Quantization in TensorRT Model Optimizer
feature request
New feature or request
#189
opened Apr 30, 2025 by
david-PHR
Cannot serve modelopt quantized nvfp4 model on TensorRT LLM
bug
Something isn't working
#187
opened Apr 27, 2025 by
enisaras
[BUG] modelopt restore quantized models using 'AutoModelForCausalLM.from_pretrained' doesn't work for mixtral-8x7b
bug
Something isn't working
#186
opened Apr 27, 2025 by
wanzhenchn
Support more Quantization methods for "onnx_ptq"?
feature request
New feature or request
#184
opened Apr 24, 2025 by
s101010tw
[BUG] Issue processing NF4 double quantization
bug
Something isn't working
#183
opened Apr 22, 2025 by
ishan-modi
Qwen2_MoE AWQ(w4a16/w4a8) quantization failed with Nan AssertionError
#182
opened Apr 22, 2025 by
wanzhenchn
Torch Quantization: Allow restoring quantized model and re-running calibration on new data (PTQ)
feature request
New feature or request
#179
opened Apr 16, 2025 by
david-PHR
Explicit INT8 Quantization Fails to Fuse Concat-Conv Block Compared to Implicit Mode
needs trt triage
Need TensorRT teams help
#174
opened Apr 9, 2025 by
patrickgrommelt
Getting Real quantization not supported for this format error when using mtq.compress(model)
#171
opened Apr 5, 2025 by
RivenSama
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.