Speedup of HEAD function used in gridded routing #877
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TYPE: Enhancement (speedup)
KEYWORDS: performance, gridded routing, profiling
SOURCE: Max Cooper, Aperture Space
DESCRIPTION OF CHANGES: Profiling of a gridded case showed 20% of time spent in the HEAD function. This function solves for water height in a trapezoidal channel by finding a height where fA(h)-AREA = 0. The existing implementation used root finding to determine h. However fA(h) = zh^2 + Bwh, giving a quadratic equation of h with A = z = 1/ChSSlp, B = Bw, and C=-AREA. As AREA is always positive, C is always negative, along with z, resulting in two real roots. As h,z is positive, only the + root is needed as sqrt(Bw^2 + 4AREAz) is always positive, resulting in (-Bw-sqrt(Bw^2 + 4AREAz))/z being negative.
The function HEAD in module_channel_routing.F90 was changed to be solved analytically, and the AREAf function removed. Comparison of Croton-gridded example case gives near equivalent output when compared before and after modification. Values that are different are in CHRTOUT_DOMAIN1, CHRTOUT_GRID1, CHANOBS_DOMAIN1, and LAKEOUT_DOMAIN1 files. Difference occurs for single indices within the file, typically the first index, and agree to >=4 decimals. The analytical result will be more accurate due to the convergence tolerance in the root finding algorithm.
Profiling 24 hr run of Croton showed being in this function ~14% before modification, to effectively 0% after.
ISSUE:
TESTS CONDUCTED: Compared output pre/post modification on Croton gridded example with nccmp.