Skip to content

[Issue]: RuntimeError: No HIP GPUs are available when running UT with self-built PyTorch on mi 300 #3799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yanbing-j opened this issue May 31, 2025 · 2 comments

Comments

@yanbing-j
Copy link

Problem Description

Hello maintainers,

I try to build PyTorch from source and run ROCM UTs on mi 300. I use the following cmd to build PyTorch.

conda install cmake ninja
pip install -r requirements.txt
pip install mkl-static mkl-include

python tools/amd_build/build_amd.py
export CMAKE_PREFIX_PATH="${CONDA_PREFIX:-'$(dirname $(which conda))/../'}:${CMAKE_PREFIX_PATH}"
python setup.py develop

And then use PYTORCH_TEST_WITH_ROCM=1 python test/test_nn.py TestNN.test_Transformer_multilayer_coder_cuda_tf32 to run UT.

It fails with

Traceback (most recent call last):
  File "/home/guest/project/pytorch/test/test_nn.py", line 43, in <module>
    from torch.testing._internal.common_device_type import dtypesIfMPS, instantiate_device_type_tests, dtypes, \
  File "/home/guest/project/pytorch/torch/testing/_internal/common_device_type.py", line 1972, in <module>
    if torch.version.hip and "gfx94" in torch.cuda.get_device_properties(0).gcnArchName:
  File "/home/guest/project/pytorch/torch/cuda/__init__.py", line 587, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/home/guest/project/pytorch/torch/cuda/__init__.py", line 383, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No HIP GPUs are available

Use sudo rocminfo can show mi 300, and my user has been added into video user group.

Agent 6
*******
  Name:                    gfx942
  Uuid:                    GPU-ea88748ecd524884
  Marketing Name:          AMD Instinct MI300X
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    5
  Device Type:             GPU
  Cache Info:
    L1:                      32(0x20) KB
    L2:                      4096(0x1000) KB
    L3:                      262144(0x40000) KB
  Chip ID:                 29857(0x74a1)

ROCm version is 6.3.1. rocm-smi can show GPUs.

And I also run $ /opt/rocm/bin/rocm_agent_enumerator, it shows

gfx942
gfx942
gfx942
gfx942
gfx942
gfx942
gfx942
gfx942

Operating System

Linux cr3ppmser239 5.15.0-139-generic #149-Ubuntu SMP Fri Apr 11 22:06:13 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

CPU

Intel(R) Xeon(R) Platinum 8470

GPU

AMD Instinct MI300X

ROCm Version

6.3.1

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@ppanchad-amd
Copy link

Hi @yanbing-j. Internal ticket has been created to investigate this issue. Thanks!

@yanbing-j
Copy link
Author

@ppanchad-amd Thanks for the information! I add render to groups and HIP runtime works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants