Improve performance of CUDA kernels generated from manual_sparse_jacobian #4093

daverumph · 2025-11-10T22:27:03Z

Purpose

Use cached values to avoid doing extra work in the kernel.

Content

The changes selectively add back use of cached variables that were removed en masse by Charlie a while back. Upon profiling with NVidia's NSight tools, we now understand that some of that caching is valuable to avoid "global" (from the GPU's perspective) memory reads.

I have read and checked the items on the review checklist.

sajjadazimi

Thanks! LGTM.

src/prognostic_equations/implicit/manual_sparse_jacobian.jl

Co-authored-by: Teja Reddy <[email protected]>

src/prognostic_equations/implicit/manual_sparse_jacobian.jl

szy21 · 2025-11-11T04:43:49Z

The regression test breaks - is the behavior change expected?

szy21 · 2025-11-11T04:47:38Z

Also could you squash the commits? Thanks!

sajjadazimi · 2025-11-11T23:31:40Z

The results have changed because K_h, which is needed later for sgs diffusion computations, gets overwritten inside the call to the function ᶜmixing_length(Y, p) around line 700 where temp_scalar_5 is used.

Hoping to address small floating point differences

daverumph requested review from dennisYatunin, imreddyTeja and sajjadazimi November 10, 2025 22:27

daverumph self-assigned this Nov 10, 2025

Merge with changes coming from main branch

5b7e47e

daverumph force-pushed the dr/gpu_perf/manual_sparse_jacobian_1 branch from 59848ad to 5b7e47e Compare November 10, 2025 22:38

sajjadazimi approved these changes Nov 10, 2025

View reviewed changes

imreddyTeja reviewed Nov 10, 2025

View reviewed changes

daverumph and others added 3 commits November 10, 2025 16:54

Update src/prognostic_equations/implicit/manual_sparse_jacobian.jl

45262f0

Co-authored-by: Teja Reddy <[email protected]>

Update src/prognostic_equations/implicit/manual_sparse_jacobian.jl

91116e9

Co-authored-by: Teja Reddy <[email protected]>

Update src/prognostic_equations/implicit/manual_sparse_jacobian.jl

89adc2b

Co-authored-by: Teja Reddy <[email protected]>

daverumph commented Nov 11, 2025

View reviewed changes

src/prognostic_equations/implicit/manual_sparse_jacobian.jl Outdated Show resolved Hide resolved

daverumph commented Nov 11, 2025

View reviewed changes

src/prognostic_equations/implicit/manual_sparse_jacobian.jl Outdated Show resolved Hide resolved

Faster implementation of a couple of lines

7d3bdd7

imreddyTeja approved these changes Nov 11, 2025

View reviewed changes

Run JuliaFormatter

22d3bab

daverumph enabled auto-merge November 11, 2025 02:16

imreddyTeja disabled auto-merge November 11, 2025 04:46

daverumph added 2 commits November 11, 2025 15:44

Change order of calculation in two places of update_jacobian

d993c03

Hoping to address small floating point differences

Allocate and use a new scratch scalar to resolve overlapping use

d66c45b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance of CUDA kernels generated from manual_sparse_jacobian #4093

Improve performance of CUDA kernels generated from manual_sparse_jacobian #4093

Uh oh!

daverumph commented Nov 10, 2025

Uh oh!

sajjadazimi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szy21 commented Nov 11, 2025

Uh oh!

szy21 commented Nov 11, 2025

Uh oh!

sajjadazimi commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Improve performance of CUDA kernels generated from manual_sparse_jacobian #4093

Are you sure you want to change the base?

Improve performance of CUDA kernels generated from manual_sparse_jacobian #4093

Uh oh!

Conversation

daverumph commented Nov 10, 2025

Purpose

Content

Uh oh!

sajjadazimi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szy21 commented Nov 11, 2025

Uh oh!

szy21 commented Nov 11, 2025

Uh oh!

sajjadazimi commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants