ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs#20472
ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs#20472hogeheer499-commits wants to merge 1 commit intoggml-org:masterfrom
Conversation
Bug verification on AMD Ryzen AI MAX+ 395 (gfx1151, 128GB unified memory)Wrote a test program that simulates the exact code path in On AMD APUs, The |
Note on end-to-end testingI was unable to reproduce the However, the mechanism is clearly demonstrated above: On my 128GB system the 91 GiB reported by The fix itself is minimal and clearly correct: |
AMD APUs report prop.integrated=1 which triggers the UMA memory path from ggml-org#17368. This overrides hipMemGetInfo() (accurate) with /proc/meminfo MemAvailable (too low), losing ~30 GiB on a 128GB Strix Halo system. For HIP builds, only enter the UMA path when GGML_CUDA_ENABLE_UNIFIED_MEMORY is explicitly set. This preserves correct behavior for both cases: - Default: hipMemGetInfo() reports accurate TTM-backed memory - GGML_CUDA_ENABLE_UNIFIED_MEMORY=1: /proc/meminfo is used (system RAM mode) Tested on AMD Ryzen AI MAX+ 395, Radeon 8060S (gfx1151), 128GB, ROCm 7.1. Fixes: ggml-org#18159
8674daa to
73357da
Compare
AMD APUs report
prop.integrated == 1, which triggers the UMA memory detection from #17368. This replaces the accuratehipMemGetInfo()value withMemAvailablefrom/proc/meminfo, which reports significantly less memory on systems with large TTM allocations (e.g. 122 GiB vs 91 GiB on a 128GB Strix Halo system).For HIP builds, skip the
prop.integratedcheck and only enter the UMA path whenGGML_CUDA_ENABLE_UNIFIED_MEMORYis explicitly set. This wayhipMemGetInfo()is used by default (which correctly reports TTM-backed memory), while the explicit env var override still works for users who need it.Verified on AMD Ryzen AI MAX+ 395 (gfx1151, 128GB unified memory, ROCm 7.1) that
prop.integratedreturns 1 andhipMemGetInfo()returns 122880 MiB whileMemAvailablereports ~91 GiB.Fixes #18159
Related: #19818, #19764, #18650