Skip to content

CUDA: add dynamic shared mem to softmax, refactor general usage #14497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 2, 2025

Conversation

am17an
Copy link
Collaborator

@am17an am17an commented Jul 2, 2025

This PR does two things:

  1. Add the dynamic shared memory to softmax. Added a perf case where shared_mem requried >= 48 kb but less than smpbo on my RTX 3090 (which is 102.4 kb)

PR

  SOFT_MAX(type=f32,ne=[12888,1024,5,1],mask=0,m_prec=f32,scale=1.000000,max_bias=0.000000):                    1320 runs -   765.44 us/run -   515520 kB/run -  647.16 GB/s

vs master

  SOFT_MAX(type=f32,ne=[12888,1024,5,1],mask=0,m_prec=f32,scale=1.000000,max_bias=0.000000):                     924 runs -  1126.70 us/run -   515520 kB/run -  439.66 GB/s
  1. It also refactors the set dynamic memory routine to a macro for readability

@am17an am17an requested a review from JohannesGaessler as a code owner July 2, 2025 08:07
@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 2, 2025
@am17an am17an force-pushed the cuda_increase_shared_mem_limits branch 2 times, most recently from 4c7bcaa to a67ef5c Compare July 2, 2025 09:20
@am17an am17an requested a review from JohannesGaessler July 2, 2025 09:46
@am17an am17an force-pushed the cuda_increase_shared_mem_limits branch from a67ef5c to 7b16281 Compare July 2, 2025 13:09
@JohannesGaessler
Copy link
Collaborator

(To be clear, the approval is conditional on not breaking the CI.)

@CISC
Copy link
Collaborator

CISC commented Jul 2, 2025

(To be clear, the approval is conditional on not breaking the CI.)

Unfair, it just broke itself (I'll restart). :)

@CISC
Copy link
Collaborator

CISC commented Jul 2, 2025

The failing CIs can be safely ignored, it's just a DNS issue.

@am17an am17an merged commit 55c2646 into ggml-org:master Jul 2, 2025
87 of 132 checks passed
@am17an am17an deleted the cuda_increase_shared_mem_limits branch July 2, 2025 23:45
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jul 3, 2025
* origin/master:
Fix conditional enabling following arch checks for ggml-sycl (ggml-org#14504)
convert : correct gemma 3n conversion (ggml-org#14450)
kv-cache : use ggml_set_rows (ggml-org#14285)
ggml : fix FA mask dim 2 and 3 (ggml-org#14505)
ggml : remove kompute backend (ggml-org#14501)
CUDA: add dynamic shared mem to softmax, refactor general usage (ggml-org#14497)
Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 5, 2025
qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 6, 2025
qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants