Skip to content

Conversation

@daverumph
Copy link

Purpose

Use cached values to avoid doing extra work in the kernel.

Content

The changes selectively add back use of cached variables that were removed en masse by Charlie a while back. Upon profiling with NVidia's NSight tools, we now understand that some of that caching is valuable to avoid "global" (from the GPU's perspective) memory reads.


  • I have read and checked the items on the review checklist.

@daverumph daverumph force-pushed the dr/gpu_perf/manual_sparse_jacobian_1 branch from 59848ad to 5b7e47e Compare November 10, 2025 22:38
Copy link
Member

@sajjadazimi sajjadazimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.

@daverumph daverumph enabled auto-merge November 11, 2025 02:16
@szy21
Copy link
Member

szy21 commented Nov 11, 2025

The regression test breaks - is the behavior change expected?

@imreddyTeja imreddyTeja disabled auto-merge November 11, 2025 04:46
@szy21
Copy link
Member

szy21 commented Nov 11, 2025

Also could you squash the commits? Thanks!

@sajjadazimi
Copy link
Member

The results have changed because K_h, which is needed later for sgs diffusion computations, gets overwritten inside the call to the function ᶜmixing_length(Y, p) around line 700 where temp_scalar_5 is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants