Use MBF in Canopy cache update #1520
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
noticed that the using the integrated canopy soil model was significantly slower than just using the soil model.
This is unexpected, as the calculations done for the soil model are more complex (not just pointwise). Profiling revealed that
the canopy cache update takes approximately 1/6 the time of each step. Almost all of this time is spent launching kernels. The cache update has a lot of broadcasts that do very simple calculations. In those cases, the kernels themselves take ~10 microseconds, and launching the kernel takes at anywhere from 20-200 microseconds (the cause of this should be investigated. I'm guessing it has to do with adapting args). This means the cpu cannot queue up work for the gpu fast enough, and the gpu idles a lot.
This PR tries using MultiBroadcastFusion in the canopy cache update. This should make the kernels themselves less efficient, but that cost is worth paying because the kernels could be 2x as slow and not actually make the simulation any slower. This is different than climaatmos.
To-do