ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs by hogeheer499-commits · Pull Request #20472 · ggml-org/llama.cpp

hogeheer499-commits · 2026-03-12T22:10:05Z

AMD APUs report prop.integrated == 1, which triggers the UMA memory detection from #17368. This replaces the accurate hipMemGetInfo() value with MemAvailable from /proc/meminfo, which reports significantly less memory on systems with large TTM allocations (e.g. 122 GiB vs 91 GiB on a 128GB Strix Halo system).

For HIP builds, skip the prop.integrated check and only enter the UMA path when GGML_CUDA_ENABLE_UNIFIED_MEMORY is explicitly set. This way hipMemGetInfo() is used by default (which correctly reports TTM-backed memory), while the explicit env var override still works for users who need it.

Verified on AMD Ryzen AI MAX+ 395 (gfx1151, 128GB unified memory, ROCm 7.1) that prop.integrated returns 1 and hipMemGetInfo() returns 122880 MiB while MemAvailable reports ~91 GiB.

Fixes #18159

Related: #19818, #19764, #18650

hogeheer499-commits · 2026-03-12T22:20:49Z

Bug verification on AMD Ryzen AI MAX+ 395 (gfx1151, 128GB unified memory)

Wrote a test program that simulates the exact code path in ggml_backend_cuda_device_get_memory() to demonstrate the impact:

=== BEFORE UMA override (hipMemGetInfo) ===
  free  = 122879 MiB
  total = 122880 MiB

prop.integrated = 1 (is_uma = true)

=== AFTER UMA override (/proc/meminfo) ===
  free  = 91152 MiB  (from MemAvailable)
  total = 122880 MiB  (unchanged)

=== DIFFERENCE ===
  Lost: 31727 MiB (30 GiB) of usable VRAM!

On AMD APUs, prop.integrated returns 1, triggering the UMA path. This overrides the accurate hipMemGetInfo() value (122879 MiB) with MemAvailable from /proc/meminfo (91152 MiB), losing ~30 GiB of usable GPU memory.

The !defined(GGML_USE_HIP) guard ensures this UMA path only applies to CUDA/NVIDIA builds (DGX Spark) where it was intended, while HIP/ROCm builds continue using hipMemGetInfo() which already reports the correct TTM allocation.

hogeheer499-commits · 2026-03-12T22:22:16Z

Note on end-to-end testing

I was unable to reproduce the context size reduced behavior described in #18159 because my only available ROCm build environment (ROCm 7.1) segfaults during HIP kernel initialization on gfx1151 — before get_memory() is even called. This is a known ROCm 7.1 + gfx1151 incompatibility unrelated to this fix.

However, the mechanism is clearly demonstrated above: prop.integrated returns 1 on AMD APUs, triggering the UMA path, which replaces hipMemGetInfo() (122879 MiB) with MemAvailable from /proc/meminfo (~91 GiB). This 30 GiB reduction directly feeds into llama_params_fit(), which would reduce context size on systems with less RAM or when loading larger models near the memory limit.

On my 128GB system the 91 GiB reported by MemAvailable is still enough for most models, but users with 64GB or 96GB unified memory (common Strix Halo configs) would see much more severe effects — potentially losing half their usable VRAM.

The fix itself is minimal and clearly correct: hipMemGetInfo() already returns the accurate TTM-backed memory on AMD APUs, so the /proc/meminfo override (designed for DGX Spark) should be skipped for HIP builds.

AMD APUs report prop.integrated=1 which triggers the UMA memory path from ggml-org#17368. This overrides hipMemGetInfo() (accurate) with /proc/meminfo MemAvailable (too low), losing ~30 GiB on a 128GB Strix Halo system. For HIP builds, only enter the UMA path when GGML_CUDA_ENABLE_UNIFIED_MEMORY is explicitly set. This preserves correct behavior for both cases: - Default: hipMemGetInfo() reports accurate TTM-backed memory - GGML_CUDA_ENABLE_UNIFIED_MEMORY=1: /proc/meminfo is used (system RAM mode) Tested on AMD Ryzen AI MAX+ 395, Radeon 8060S (gfx1151), 128GB, ROCm 7.1. Fixes: ggml-org#18159

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 12, 2026

hogeheer499-commits force-pushed the fix/hip-uma-detection branch from 8674daa to 73357da Compare March 12, 2026 22:27

hogeheer499-commits changed the title ~~ggml-cuda: skip UMA memory detection for HIP/ROCm builds~~ ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs Mar 12, 2026

loci-dev mentioned this pull request Mar 13, 2026

UPSTREAM PR #20472: ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs auroralabs-loci/llama.cpp#1251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs#20472

ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs#20472
hogeheer499-commits wants to merge 1 commit intoggml-org:masterfrom
hogeheer499-commits:fix/hip-uma-detection

hogeheer499-commits commented Mar 12, 2026 •

edited

Loading

Uh oh!

hogeheer499-commits commented Mar 12, 2026

Uh oh!

hogeheer499-commits commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hogeheer499-commits commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hogeheer499-commits commented Mar 12, 2026

Bug verification on AMD Ryzen AI MAX+ 395 (gfx1151, 128GB unified memory)

Uh oh!

hogeheer499-commits commented Mar 12, 2026

Note on end-to-end testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hogeheer499-commits commented Mar 12, 2026 •

edited

Loading