[Issue]: RuntimeError: No HIP GPUs are available when running UT with self-built PyTorch on mi 300 #3799

yanbing-j · 2025-05-31T01:59:27Z

Problem Description

Hello maintainers,

I try to build PyTorch from source and run ROCM UTs on mi 300. I use the following cmd to build PyTorch.

conda install cmake ninja
pip install -r requirements.txt
pip install mkl-static mkl-include

python tools/amd_build/build_amd.py
export CMAKE_PREFIX_PATH="${CONDA_PREFIX:-'$(dirname $(which conda))/../'}:${CMAKE_PREFIX_PATH}"
python setup.py develop

And then use PYTORCH_TEST_WITH_ROCM=1 python test/test_nn.py TestNN.test_Transformer_multilayer_coder_cuda_tf32 to run UT.

It fails with

Traceback (most recent call last):
  File "/home/guest/project/pytorch/test/test_nn.py", line 43, in <module>
    from torch.testing._internal.common_device_type import dtypesIfMPS, instantiate_device_type_tests, dtypes, \
  File "/home/guest/project/pytorch/torch/testing/_internal/common_device_type.py", line 1972, in <module>
    if torch.version.hip and "gfx94" in torch.cuda.get_device_properties(0).gcnArchName:
  File "/home/guest/project/pytorch/torch/cuda/__init__.py", line 587, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/home/guest/project/pytorch/torch/cuda/__init__.py", line 383, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No HIP GPUs are available

Use sudo rocminfo can show mi 300, and my user has been added into video user group.

Agent 6
*******
  Name:                    gfx942
  Uuid:                    GPU-ea88748ecd524884
  Marketing Name:          AMD Instinct MI300X
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    5
  Device Type:             GPU
  Cache Info:
    L1:                      32(0x20) KB
    L2:                      4096(0x1000) KB
    L3:                      262144(0x40000) KB
  Chip ID:                 29857(0x74a1)

ROCm version is 6.3.1. rocm-smi can show GPUs.

And I also run $ /opt/rocm/bin/rocm_agent_enumerator, it shows

gfx942
gfx942
gfx942
gfx942
gfx942
gfx942
gfx942
gfx942

Operating System

Linux cr3ppmser239 5.15.0-139-generic #149-Ubuntu SMP Fri Apr 11 22:06:13 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

CPU

Intel(R) Xeon(R) Platinum 8470

GPU

AMD Instinct MI300X

ROCm Version

6.3.1

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

The text was updated successfully, but these errors were encountered:

ppanchad-amd · 2025-06-02T13:52:28Z

Hi @yanbing-j. Internal ticket has been created to investigate this issue. Thanks!

yanbing-j · 2025-06-03T02:18:07Z

@ppanchad-amd Thanks for the information! I add render to groups and HIP runtime works.

ppanchad-amd added the Under Investigation label Jun 2, 2025

yanbing-j closed this as completed Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Issue]: RuntimeError: No HIP GPUs are available when running UT with self-built PyTorch on mi 300 #3799

[Issue]: RuntimeError: No HIP GPUs are available when running UT with self-built PyTorch on mi 300 #3799

yanbing-j commented May 31, 2025

ppanchad-amd commented Jun 2, 2025

Uh oh!

yanbing-j commented Jun 3, 2025

Uh oh!

[Issue]: RuntimeError: No HIP GPUs are available when running UT with self-built PyTorch on mi 300 #3799

[Issue]: RuntimeError: No HIP GPUs are available when running UT with self-built PyTorch on mi 300 #3799

Comments

yanbing-j commented May 31, 2025

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

ppanchad-amd commented Jun 2, 2025

Uh oh!

yanbing-j commented Jun 3, 2025

Uh oh!