ggml: fix #1186 - don't include arm_neon.h when using CUDA 12 with ARM Neon #1187

cmdr2 · 2025-04-10T07:57:13Z

arm_neon.h wasn't included previously for any version of CUDA, whereas it gets included now for CUDA 12 (which is a bug).

Scenarios:

- ARM Neon
 - MSVC:
  - prev: Include header, use uint16, use fallback fp conversion functions
  - current (ok): Include header, use fp16, use actual fp conversion functions
  - expected: Include header, use fp16, use actual fp conversion functions
 - GCC:
  - prev: include header, use fp16, use actual fp conversion functions
  - current (ok): include header, use fp16, use actual fp conversion functions
  - expected: include header, use fp16, use actual fp conversion functions
 - CUDA <= 11
  - prev: don't include header, use uint16, use actual fp conversion functions
  - current (ok): don't include header, use fallback fp conversion functions
  - expected: don't include header, use fallback fp conversion functions
 - CUDA 12
  - prev: don't include header, use fp16, use actual fp conversion functions
  - current (bug!): include header (bug), use fp16, use actual fp conversion functions
  - expected: don't include header, use fp16, use actual fp conversion functions
- Not ARM Neon:
 - prev: don't include header, use other functions
 - current (ok): don't include header, use other functions
 - expected: don't include header, use other functions

… with ARM Neon

ggerganov

Got it. The reason I was confused is because I thought that the nvcc compiler needs to pick __fp16 from arm_neon.h. But it seems that it will have it defined somewhere internally.

ggerganov · 2025-04-10T09:38:12Z

Running the llama.cpp CI just to make sure all is good and will merge.

cmdr2 · 2025-04-10T12:42:02Z

I thought that the nvcc compiler needs to pick __fp16 from arm_neon.h

Yeah that threw me off too initially. But I mistakenly assumed that CUDA has __fp16 (I was thinking of __half). It doesn't look like CUDA has it.

Now I'm not sure how this code path worked in the past (or works now).

Digging a bit more.

cmdr2 · 2025-04-10T13:30:05Z

Sorry, I rushed to conclusions here, and filed a PR over-eagerly. My bad. :)

The previous code assumed that __fp16 will work on CUDA 12's nvcc. Maybe that's true, but I'm not sure how. Unless it's an intrinsic data type in nvcc for ARM? I couldn't find any reference for this.

The original issue and PR which introduced the CUDA <= 11 check says that __fp16 compiled for CUDA 12 - https://github.com/ggml-org/llama.cpp/pull/10616/files (related comment - ggml-org/llama.cpp#10555 (comment) )

We'll know more once @colintoal tries to compile the latest version.

If it doesn't compile, maybe that check should be !defined(__CUDACC__) instead of !(defined(__CUDACC__) && __CUDACC_VER_MAJOR__ <= 11)

cmdr2 · 2025-04-11T04:54:20Z

I guess it works, since colintoal confirmed on the issue that this fix compiles. I'm not sure how it works tbh, maybe it's an intrinsic data type for nvcc on ARM? Anyway, good to know.

ggerganov · 2025-04-11T06:34:21Z

Yes, tbh I am also not 100% confident I understand how __fp16 is handled by the nvcc compiler and it's different versions.

Btw, another report for this was submitted in the llama.cpp repo and the proposed fix is the same: ggml-org/llama.cpp#12872. So it seems this should be good.

ggml: fix ggml-org#1186 - don't include arm_neon.h when using CUDA 12…

f7d324d

… with ARM Neon

ggerganov approved these changes Apr 10, 2025

View reviewed changes

ggerganov merged commit 2af91a9 into ggml-org:master Apr 10, 2025
3 checks passed

cmdr2 deleted the arm-neon-cu12-fix branch April 11, 2025 05:13

ggerganov mentioned this pull request Apr 11, 2025

ggml-impl.h: Fix build issues on AArch64 with CUDA version 12 ggml-org/llama.cpp#12872

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml: fix #1186 - don't include arm_neon.h when using CUDA 12 with ARM Neon #1187

ggml: fix #1186 - don't include arm_neon.h when using CUDA 12 with ARM Neon #1187

cmdr2 commented Apr 10, 2025 •

edited

Loading

ggerganov left a comment

ggerganov commented Apr 10, 2025

cmdr2 commented Apr 10, 2025

cmdr2 commented Apr 10, 2025

cmdr2 commented Apr 11, 2025

ggerganov commented Apr 11, 2025

ggml: fix #1186 - don't include arm_neon.h when using CUDA 12 with ARM Neon #1187

ggml: fix #1186 - don't include arm_neon.h when using CUDA 12 with ARM Neon #1187

Conversation

cmdr2 commented Apr 10, 2025 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

ggerganov commented Apr 10, 2025

cmdr2 commented Apr 10, 2025

cmdr2 commented Apr 10, 2025

cmdr2 commented Apr 11, 2025

ggerganov commented Apr 11, 2025

cmdr2 commented Apr 10, 2025 •

edited

Loading