NVIDIA / TensorRT-Model-Optimizer Public

Notifications You must be signed in to change notification settings
Fork 82
Star 989

Code
Issues 87
Pull requests 5
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Issues: NVIDIA/TensorRT-Model-Optimizer

[RFC] TensorRT Model Optimizer - Product Roadmap

#146 opened Mar 6, 2025 by omrialmog

Open

Beta

Labels 9 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

87 Open 101 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

5090 error

#218 opened Jun 17, 2025 by lloveapple

support for python 3.13 feature request

New feature or request

#217 opened Jun 16, 2025 by yash-s20

can not restore model

#216 opened Jun 16, 2025 by reza-shahriari

INT8 to FP8 scale conversion

#215 opened Jun 15, 2025 by roamiri

flux-dev 5090

#213 opened Jun 13, 2025 by lloveapple

the error for quantize.py

#212 opened Jun 13, 2025 by lloveapple

Bugs in the speculative decoding example bug

Something isn't working

#211 opened Jun 11, 2025 by Framartin

Explicit quantization in PyTorch before ONNX leads to slower TRT engine than ONNX PTQ bug

Something isn't working

#207 opened Jun 8, 2025 by liukang1811

Apart from fp8, are there any plans to directly support vllm inference with awq in the future feature request

New feature or request

#204 opened May 28, 2025 by dingjingzhen

get_modelike_from_algo_cfg doesn't accept QuantizeAlgorithmConfig, but it's typing says it should bug

Something isn't working

#201 opened May 21, 2025 by ORippler

[BUG] FP8 real_quantization doesnt work with block_sizes bug

Something isn't working

#193 opened May 9, 2025 by ishan-modi

is fp8 quantization with block-wise/per-token/per-channel supported feature request

New feature or request

#192 opened May 9, 2025 by YSF-A

What is the difference of config in mtq.quantize() and config in TensorQuantizer

#190 opened May 6, 2025 by YSF-A

Support for W4A16 and W4A8 Quantization in TensorRT Model Optimizer feature request

New feature or request

#189 opened Apr 30, 2025 by david-PHR

Cannot serve modelopt quantized nvfp4 model on TensorRT LLM bug

Something isn't working

#187 opened Apr 27, 2025 by enisaras

[BUG] modelopt restore quantized models using 'AutoModelForCausalLM.from_pretrained' doesn't work for mixtral-8x7b bug

Something isn't working

#186 opened Apr 27, 2025 by wanzhenchn

Support more Quantization methods for "onnx_ptq"? feature request

New feature or request

#184 opened Apr 24, 2025 by s101010tw

[BUG] Issue processing NF4 double quantization bug

Something isn't working

#183 opened Apr 22, 2025 by ishan-modi

Qwen2_MoE AWQ(w4a16/w4a8) quantization failed with Nan AssertionError

#182 opened Apr 22, 2025 by wanzhenchn

Torch Quantization: Allow restoring quantized model and re-running calibration on new data (PTQ) feature request

New feature or request

#179 opened Apr 16, 2025 by david-PHR

Run w4a8 quant for deepseek r1 on 8xH20 OOM

#177 opened Apr 14, 2025 by Kiokana

Explicit INT8 Quantization Fails to Fuse Concat-Conv Block Compared to Implicit Mode needs trt triage

Need TensorRT teams help

#174 opened Apr 9, 2025 by patrickgrommelt

Getting Real quantization not supported for this format error when using mtq.compress(model)

#171 opened Apr 5, 2025 by RivenSama

SDPA Int8 Quantisation using MTQ

#170 opened Apr 4, 2025 by satya-penamakuri

slower when quantize whole bert model than quantize only ffn

#159 opened Mar 20, 2025 by DamonsJ

Previous 1 2 3 4 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!