Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug] cannot load quantize_fp8 even though the modelopt[all] installed #3232

Open
braindevices opened this issue Oct 13, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@braindevices
Copy link

Bug Description

cannot load quantize_fp8 even though the modelopt[all] installed

WARNING:torch_tensorrt.dynamo.conversion.aten_ops_converters:Unable to import quantization op. Please install modelopt library (https://github.com/NVIDIA/TensorRT-Model-Optimizer?tab=readme-ov-file#installation) to add support for compiling quantized models
WARNING:py.warnings:/usr/lib64/python3.11/tempfile.py:904: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp91sqmx7h'>
  _warnings.warn(warn_message, ResourceWarning)
+ exec python -c 'import modelopt.torch.quantization.extensions as ext; print(ext.cuda_ext); print(ext.cuda_ext_fp8)'
Loading extension modelopt_cuda_ext...
<module 'modelopt_cuda_ext' from '/home/user/.cache/torch_extensions/py311_cu121/modelopt_cuda_ext/modelopt_cuda_ext.so'>
Loading extension modelopt_cuda_ext_fp8...
<module 'modelopt_cuda_ext_fp8' from '/home/user/.cache/torch_extensions/py311_cu121/modelopt_cuda_ext_fp8/modelopt_cuda_ext_fp8.so'>

nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-modelopt          0.17.0
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.5.82
nvidia-nvtx-cu12         12.1.105

torch                    2.4.1
torch_tensorrt           2.4.0
torchaudio               2.4.1
torchinfo                1.8.0
torchmetrics             1.4.3
torchprofile             0.0.4
torchvision              0.19.1
tensorrt                 10.1.0
tensorrt-cu12            10.5.0
tensorrt-cu12-bindings   10.1.0
tensorrt-cu12-libs       10.1.0

To Reproduce

Steps to reproduce the behavior:

  1. create venv and activate it
  2. install torch, torchvision, torchaudio, tensorrt, nvidia-modelopt[all] torch_tensorrt
  3. python -c "import modelopt.torch.quantization.extensions as ext; print(ext.cuda_ext); print(ext.cuda_ext_fp8)"
  4. python -c "import torch_tensorrt"

Expected behavior

should import without warning

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): '10.1.0'
  • PyTorch Version (e.g. 1.0): 2.4.1
  • CPU Architecture: x86_64
  • OS (e.g., Linux): Ubuntu
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version: 3.11
  • CUDA version: 12.1
  • GPU models and configuration: RTX4k
  • Any other relevant information:

Additional context

@braindevices braindevices added the bug Something isn't working label Oct 13, 2024
@HolyWu
Copy link
Contributor

HolyWu commented Oct 13, 2024

torch_tensorrt 2.4.0 used torch.ops.trt.quantize_fp8 at the time of release. The latest main branch already changed to use torch.ops.tensorrt.quantize_op for nvidia-modelopt 0.17.0. You can install the latest nightly build by:

pip install --pre -U torch torchaudio torchvision torch_tensorrt --index-url https://download.pytorch.org/whl/nightly/cu121 --extra-index-url https://pypi.nvidia.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants