Skip to content

ggml: fix #1186 - don't include arm_neon.h when using CUDA 12 with ARM Neon #1187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 10, 2025

Conversation

cmdr2
Copy link
Collaborator

@cmdr2 cmdr2 commented Apr 10, 2025

arm_neon.h wasn't included previously for any version of CUDA, whereas it gets included now for CUDA 12 (which is a bug).

Scenarios:

- ARM Neon
 - MSVC:
  - prev: Include header, use uint16, use fallback fp conversion functions
  - current (ok): Include header, use fp16, use actual fp conversion functions
  - expected: Include header, use fp16, use actual fp conversion functions
 - GCC:
  - prev: include header, use fp16, use actual fp conversion functions
  - current (ok): include header, use fp16, use actual fp conversion functions
  - expected: include header, use fp16, use actual fp conversion functions
 - CUDA <= 11
  - prev: don't include header, use uint16, use actual fp conversion functions
  - current (ok): don't include header, use fallback fp conversion functions
  - expected: don't include header, use fallback fp conversion functions
 - CUDA 12
  - prev: don't include header, use fp16, use actual fp conversion functions
  - current (bug!): include header (bug), use fp16, use actual fp conversion functions
  - expected: don't include header, use fp16, use actual fp conversion functions
- Not ARM Neon:
 - prev: don't include header, use other functions
 - current (ok): don't include header, use other functions
 - expected: don't include header, use other functions

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. The reason I was confused is because I thought that the nvcc compiler needs to pick __fp16 from arm_neon.h. But it seems that it will have it defined somewhere internally.

@ggerganov
Copy link
Member

Running the llama.cpp CI just to make sure all is good and will merge.

@ggerganov ggerganov merged commit 2af91a9 into ggml-org:master Apr 10, 2025
3 checks passed
@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 10, 2025

I thought that the nvcc compiler needs to pick __fp16 from arm_neon.h

Yeah that threw me off too initially. But I mistakenly assumed that CUDA has __fp16 (I was thinking of __half). It doesn't look like CUDA has it.

Now I'm not sure how this code path worked in the past (or works now).

Digging a bit more.

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 10, 2025

Sorry, I rushed to conclusions here, and filed a PR over-eagerly. My bad. :)

The previous code assumed that __fp16 will work on CUDA 12's nvcc. Maybe that's true, but I'm not sure how. Unless it's an intrinsic data type in nvcc for ARM? I couldn't find any reference for this.

The original issue and PR which introduced the CUDA <= 11 check says that __fp16 compiled for CUDA 12 - https://github.com/ggml-org/llama.cpp/pull/10616/files (related comment - ggml-org/llama.cpp#10555 (comment) )

We'll know more once @colintoal tries to compile the latest version.

If it doesn't compile, maybe that check should be !defined(__CUDACC__) instead of !(defined(__CUDACC__) && __CUDACC_VER_MAJOR__ <= 11)

@cmdr2
Copy link
Collaborator Author

cmdr2 commented Apr 11, 2025

I guess it works, since colintoal confirmed on the issue that this fix compiles. I'm not sure how it works tbh, maybe it's an intrinsic data type for nvcc on ARM? Anyway, good to know.

@ggerganov
Copy link
Member

Yes, tbh I am also not 100% confident I understand how __fp16 is handled by the nvcc compiler and it's different versions.

Btw, another report for this was submitted in the llama.cpp repo and the proposed fix is the same: ggml-org/llama.cpp#12872. So it seems this should be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants