Skip to content

ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs#20472

Open
hogeheer499-commits wants to merge 1 commit intoggml-org:masterfrom
hogeheer499-commits:fix/hip-uma-detection
Open

ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs#20472
hogeheer499-commits wants to merge 1 commit intoggml-org:masterfrom
hogeheer499-commits:fix/hip-uma-detection

Conversation

@hogeheer499-commits
Copy link

@hogeheer499-commits hogeheer499-commits commented Mar 12, 2026

AMD APUs report prop.integrated == 1, which triggers the UMA memory detection from #17368. This replaces the accurate hipMemGetInfo() value with MemAvailable from /proc/meminfo, which reports significantly less memory on systems with large TTM allocations (e.g. 122 GiB vs 91 GiB on a 128GB Strix Halo system).

For HIP builds, skip the prop.integrated check and only enter the UMA path when GGML_CUDA_ENABLE_UNIFIED_MEMORY is explicitly set. This way hipMemGetInfo() is used by default (which correctly reports TTM-backed memory), while the explicit env var override still works for users who need it.

Verified on AMD Ryzen AI MAX+ 395 (gfx1151, 128GB unified memory, ROCm 7.1) that prop.integrated returns 1 and hipMemGetInfo() returns 122880 MiB while MemAvailable reports ~91 GiB.

Fixes #18159

Related: #19818, #19764, #18650

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 12, 2026
@hogeheer499-commits
Copy link
Author

Bug verification on AMD Ryzen AI MAX+ 395 (gfx1151, 128GB unified memory)

Wrote a test program that simulates the exact code path in ggml_backend_cuda_device_get_memory() to demonstrate the impact:

=== BEFORE UMA override (hipMemGetInfo) ===
  free  = 122879 MiB
  total = 122880 MiB

prop.integrated = 1 (is_uma = true)

=== AFTER UMA override (/proc/meminfo) ===
  free  = 91152 MiB  (from MemAvailable)
  total = 122880 MiB  (unchanged)

=== DIFFERENCE ===
  Lost: 31727 MiB (30 GiB) of usable VRAM!

On AMD APUs, prop.integrated returns 1, triggering the UMA path. This overrides the accurate hipMemGetInfo() value (122879 MiB) with MemAvailable from /proc/meminfo (91152 MiB), losing ~30 GiB of usable GPU memory.

The !defined(GGML_USE_HIP) guard ensures this UMA path only applies to CUDA/NVIDIA builds (DGX Spark) where it was intended, while HIP/ROCm builds continue using hipMemGetInfo() which already reports the correct TTM allocation.

@hogeheer499-commits
Copy link
Author

Note on end-to-end testing

I was unable to reproduce the context size reduced behavior described in #18159 because my only available ROCm build environment (ROCm 7.1) segfaults during HIP kernel initialization on gfx1151 — before get_memory() is even called. This is a known ROCm 7.1 + gfx1151 incompatibility unrelated to this fix.

However, the mechanism is clearly demonstrated above: prop.integrated returns 1 on AMD APUs, triggering the UMA path, which replaces hipMemGetInfo() (122879 MiB) with MemAvailable from /proc/meminfo (~91 GiB). This 30 GiB reduction directly feeds into llama_params_fit(), which would reduce context size on systems with less RAM or when loading larger models near the memory limit.

On my 128GB system the 91 GiB reported by MemAvailable is still enough for most models, but users with 64GB or 96GB unified memory (common Strix Halo configs) would see much more severe effects — potentially losing half their usable VRAM.

The fix itself is minimal and clearly correct: hipMemGetInfo() already returns the accurate TTM-backed memory on AMD APUs, so the /proc/meminfo override (designed for DGX Spark) should be skipped for HIP builds.

AMD APUs report prop.integrated=1 which triggers the UMA memory
path from ggml-org#17368. This overrides hipMemGetInfo() (accurate) with
/proc/meminfo MemAvailable (too low), losing ~30 GiB on a 128GB
Strix Halo system.

For HIP builds, only enter the UMA path when GGML_CUDA_ENABLE_UNIFIED_MEMORY
is explicitly set. This preserves correct behavior for both cases:
- Default: hipMemGetInfo() reports accurate TTM-backed memory
- GGML_CUDA_ENABLE_UNIFIED_MEMORY=1: /proc/meminfo is used (system RAM mode)

Tested on AMD Ryzen AI MAX+ 395, Radeon 8060S (gfx1151), 128GB, ROCm 7.1.

Fixes: ggml-org#18159
@hogeheer499-commits hogeheer499-commits changed the title ggml-cuda: skip UMA memory detection for HIP/ROCm builds ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: UMA detection incorrectly limits available memory on AMD APUs with large TTM allocations

1 participant