transformer_engine failed to import with exception libcudnn_adv.so.9: cannot open shared object file: No such file or directory

**Describe the bug**

A pip installed (as in the instructions) version of transformer engine does not find the CuDNN libs installed via pip (in `site-packages/nvidia/*/lib/*.so`) 

**Steps/Code to reproduce bug**

In an PyTorch env (so usual cuda libs installed):
```
# For PyTorch integration
pip install --no-build-isolation transformer_engine[pytorch]

python3 -c "import transformer_engine"
```

**Expected behavior**

A clear and concise description of what you expected to happen.

**Environment overview (please complete the following information)**

 - Environment location: Lightning.AI Studio
 - Method of Transformer Engine install: pip, see above.

**Environment details**

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
- OS version Ubuntu 24.04
~ pip list | grep 'torch\|nvidia'
nvfuser_cu128_torch27     0.2.27.dev20250601
nvidia-cublas-cu12        12.8.3.14
nvidia-cuda-cupti-cu12    12.8.57
nvidia-cuda-nvrtc-cu12    12.8.61
nvidia-cuda-runtime-cu12  12.8.57
nvidia-cudnn-cu12         9.7.1.26
nvidia-cudnn-frontend     1.12.0
nvidia-cufft-cu12         11.3.3.41
nvidia-cufile-cu12        1.13.0.11
nvidia-curand-cu12        10.3.9.55
nvidia-cusolver-cu12      11.7.2.55
nvidia-cusparse-cu12      12.5.7.53
nvidia-cusparselt-cu12    0.6.3
nvidia-nccl-cu12          2.26.2
nvidia-nvjitlink-cu12     12.8.61
nvidia-nvtx-cu12          12.8.55
pytorch-lightning         2.5.1.post0
torch                     2.7.0+cu128
torchmetrics              1.3.1
torchvision               0.22.0+cu128
transformer_engine_torch  2.3.0

**Device details**
- H100 from GCP

**Additional context**

The nvidia packages provided above do contain the required libs, TE should probably look for them there. A workaround is LD_LIBRARY_PATH, but it's tedious.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

transformer_engine failed to import with exception libcudnn_adv.so.9: cannot open shared object file: No such file or directory #1853

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

transformer_engine failed to import with exception libcudnn_adv.so.9: cannot open shared object file: No such file or directory #1853

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions