CUDA: add dynamic shared mem to softmax, refactor general usage #14497

am17an · 2025-07-02T08:07:55Z

This PR does two things:

Add the dynamic shared memory to softmax. Added a perf case where shared_mem requried >= 48 kb but less than smpbo on my RTX 3090 (which is 102.4 kb)

PR

  SOFT_MAX(type=f32,ne=[12888,1024,5,1],mask=0,m_prec=f32,scale=1.000000,max_bias=0.000000):                    1320 runs -   765.44 us/run -   515520 kB/run -  647.16 GB/s

vs master

  SOFT_MAX(type=f32,ne=[12888,1024,5,1],mask=0,m_prec=f32,scale=1.000000,max_bias=0.000000):                     924 runs -  1126.70 us/run -   515520 kB/run -  439.66 GB/s

It also refactors the set dynamic memory routine to a macro for readability

ggml/src/ggml-cuda/common.cuh

ggml/src/ggml-cuda/cross-entropy-loss.cu

ggml/src/ggml-cuda/softmax.cu

JohannesGaessler · 2025-07-02T13:50:04Z

(To be clear, the approval is conditional on not breaking the CI.)

CISC · 2025-07-02T13:53:57Z

(To be clear, the approval is conditional on not breaking the CI.)

Unfair, it just broke itself (I'll restart). :)

CISC · 2025-07-02T19:21:40Z

The failing CIs can be safely ignored, it's just a DNS issue.

* origin/master: Fix conditional enabling following arch checks for ggml-sycl (ggml-org#14504) convert : correct gemma 3n conversion (ggml-org#14450) kv-cache : use ggml_set_rows (ggml-org#14285) ggml : fix FA mask dim 2 and 3 (ggml-org#14505) ggml : remove kompute backend (ggml-org#14501) CUDA: add dynamic shared mem to softmax, refactor general usage (ggml-org#14497)

…-org#14497)

am17an requested a review from JohannesGaessler as a code owner July 2, 2025 08:07

JohannesGaessler reviewed Jul 2, 2025

View reviewed changes

ggml/src/ggml-cuda/common.cuh Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/cross-entropy-loss.cu Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/softmax.cu Outdated Show resolved Hide resolved

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 2, 2025

am17an force-pushed the cuda_increase_shared_mem_limits branch 2 times, most recently from 4c7bcaa to a67ef5c Compare July 2, 2025 09:20

am17an requested a review from JohannesGaessler July 2, 2025 09:46

am17an added 3 commits July 2, 2025 20:53

CUDA: add dynamic shared mem to softmax, refactor general usage

b9bcb7d

Review: refactor switch statement, change cross_entropy to use full size

34e5142

rebase

7b16281

am17an force-pushed the cuda_increase_shared_mem_limits branch from a67ef5c to 7b16281 Compare July 2, 2025 13:09

JohannesGaessler approved these changes Jul 2, 2025

View reviewed changes

am17an merged commit 55c2646 into ggml-org:master Jul 2, 2025
87 of 132 checks passed

am17an deleted the cuda_increase_shared_mem_limits branch July 2, 2025 23:45

Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 5, 2025

CUDA: add dynamic shared mem to softmax, refactor general usage (ggml…

e7092f1

…-org#14497)

qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 6, 2025

CUDA: add dynamic shared mem to softmax, refactor general usage (ggml…

3c769d7

…-org#14497)

qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 6, 2025

CUDA: add dynamic shared mem to softmax, refactor general usage (ggml…

9455f96

…-org#14497)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: add dynamic shared mem to softmax, refactor general usage #14497

CUDA: add dynamic shared mem to softmax, refactor general usage #14497

Uh oh!

am17an commented Jul 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler commented Jul 2, 2025

Uh oh!

CISC commented Jul 2, 2025

Uh oh!

CISC commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

CUDA: add dynamic shared mem to softmax, refactor general usage #14497

CUDA: add dynamic shared mem to softmax, refactor general usage #14497

Uh oh!

Conversation

am17an commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler commented Jul 2, 2025

Uh oh!

CISC commented Jul 2, 2025

Uh oh!

CISC commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

am17an commented Jul 2, 2025 •

edited

Loading