You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to inquire if there are plans to extend support for W4A16 and W4A8 quantization to the standard TensorRT backend, beyond just TensorRT-LLM.
Such support would be highly beneficial for deploying models in environments where only TensorRT is used and not TensorRT-LLM
Thank you for your continued efforts in optimizing model deployment workflows.!
The text was updated successfully, but these errors were encountered:
Hello NVIDIA team,
I noticed that the TensorRT Model Optimizer currently supports W4A16 and W4A8 quantization configurations, as detailed in your quantization configuration documentation.
nvidia.github.io
However, according to the Best Practices for Choosing Quantization Methods, these configurations are currently deployable only via TensorRT-LLM.
nvidia.github.io
I would like to inquire if there are plans to extend support for W4A16 and W4A8 quantization to the standard TensorRT backend, beyond just TensorRT-LLM.
Such support would be highly beneficial for deploying models in environments where only TensorRT is used and not TensorRT-LLM
Thank you for your continued efforts in optimizing model deployment workflows.!
The text was updated successfully, but these errors were encountered: