Skip to content

Conversation

@imreddyTeja
Copy link
Member

@imreddyTeja imreddyTeja commented Oct 30, 2025

noticed that the using the integrated canopy soil model was significantly slower than just using the soil model.
This is unexpected, as the calculations done for the soil model are more complex (not just pointwise). Profiling revealed that
the canopy cache update takes approximately 1/6 the time of each step. Almost all of this time is spent launching kernels. The cache update has a lot of broadcasts that do very simple calculations. In those cases, the kernels themselves take ~10 microseconds, and launching the kernel takes at anywhere from 20-200 microseconds (the cause of this should be investigated. I'm guessing it has to do with adapting args). This means the cpu cannot queue up work for the gpu fast enough, and the gpu idles a lot.

This PR tries using MultiBroadcastFusion in the canopy cache update. This should make the kernels themselves less efficient, but that cost is worth paying because the kernels could be 2x as slow and not actually make the simulation any slower. This is different than climaatmos.

To-do

  • do the same to lsm_radiant_energy_fluxes!, which takes ~1/12 of the total step time
  • This only works with a modification to ClimaCore at the moment. I could not figure out how to get the MultiBroadcastFusionCUDAExt to load during the ClimaCoreCUDAExt loading.

  • I have read and checked the items on the review checklist.

Add CPU runs to benchmark pipeline

Also add a flag to the Bucket benchmark,
which enabled the NaN callback. This also
adds a bucket with nan cb benchmark with cpu
and gpu to the pipeline

Add cpu benchmarks and NaN cb bench

bugfixes

import cuda

Add nan_cb tests

pickup artifacts

Add NVTX annotations

Update manifests

update manifests proper

update enchmarks

more up; [skip-ci]

Add allocs flame

WIP

add experimental pipeline
This should give a performance boost
when using gpu. I'm not sure how this would
affect cpu runs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants