Skip to content

transformer_engine failed to import with exception libcudnn_adv.so.9: cannot open shared object file: No such file or directory #1853

Open
@t-vi

Description

@t-vi

Describe the bug

A pip installed (as in the instructions) version of transformer engine does not find the CuDNN libs installed via pip (in site-packages/nvidia/*/lib/*.so)

Steps/Code to reproduce bug

In an PyTorch env (so usual cuda libs installed):

# For PyTorch integration
pip install --no-build-isolation transformer_engine[pytorch]

python3 -c "import transformer_engine"

Expected behavior

A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

  • Environment location: Lightning.AI Studio
  • Method of Transformer Engine install: pip, see above.

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version Ubuntu 24.04
    ~ pip list | grep 'torch|nvidia'
    nvfuser_cu128_torch27 0.2.27.dev20250601
    nvidia-cublas-cu12 12.8.3.14
    nvidia-cuda-cupti-cu12 12.8.57
    nvidia-cuda-nvrtc-cu12 12.8.61
    nvidia-cuda-runtime-cu12 12.8.57
    nvidia-cudnn-cu12 9.7.1.26
    nvidia-cudnn-frontend 1.12.0
    nvidia-cufft-cu12 11.3.3.41
    nvidia-cufile-cu12 1.13.0.11
    nvidia-curand-cu12 10.3.9.55
    nvidia-cusolver-cu12 11.7.2.55
    nvidia-cusparse-cu12 12.5.7.53
    nvidia-cusparselt-cu12 0.6.3
    nvidia-nccl-cu12 2.26.2
    nvidia-nvjitlink-cu12 12.8.61
    nvidia-nvtx-cu12 12.8.55
    pytorch-lightning 2.5.1.post0
    torch 2.7.0+cu128
    torchmetrics 1.3.1
    torchvision 0.22.0+cu128
    transformer_engine_torch 2.3.0

Device details

  • H100 from GCP

Additional context

The nvidia packages provided above do contain the required libs, TE should probably look for them there. A workaround is LD_LIBRARY_PATH, but it's tedious.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions