-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop Pre-dissolved Basins Layer for Faster Delineations #9
Comments
I performed some initial testing to determine an appropriate upstream subshed threshold before performing a pre-dissolve. The goal is to identify a value (n) when reaches with subsheds less than or equal to the threshold are pre-dissolved. Threshold in this case refers to a count of subshed polygons upstream, where a value less than or equal to the threshold are pre-dissolved. One import note for interpreting results is that testing was conducted at the root of the watershed. For results with larger thresholds and smaller watersheds the pre-dissolve logic would have resulted in a entire watershed being predissolved. This makes the results for those scenarios look extremely performant, but the reality is different. Had we tested as just one reach upstream from root, the pre-dissolve logic would not have been used at all, and the results would be closer to the control scenario.
|
From these initial results I think a threshold of around 100 seems like a reasonable setting, though I should also run this again with a 200 threshold. Looks at the control (no pre-dissolve) case it seem like 263 is a reasonable runtime and then 443 starts to get into too slow territory. I think any higher than 200 and we run the risk of selecting having reaches upstream of a pre-dissolve node with sufficiently large subshed counts as to impact delineation performance. |
I added in results for a threshold of 200. The performance does seem reasonably worth bump up to processing threshold to 200. Noting the the basing line performance for 200 polygons (without pre-dissolve) is reasonable. |
@ptomasula, Thanks for sharing all these results ! They're great to see. |
Related issue #9 This adds a function to leverage the MNSI information and group upstream basins into meaningful groups that can be pre-dissolved. Pre-dissolving will allow for less total in the final dissolve.
@ptomasula, I figured out, fixed, and tested the issue with the batch pipeline. The short story is that we used `compute_dissolve_groups()` in the wrong sequence of the workflow. I also added a few other fixes, such as for dtypes and adding ELEMENT_COUNT back to the output fields.
@ptomasula, I figured out, fixed, and tested the issue with the batch pipeline in 3d441b5 (see commit notes). Try running it through our files our modeling computer! |
Summary
During the development of dissolve logic under #7, it was discovered that dissolving modestly large watersheds is resource intensive and non-performant. In a test on decent hardware, combining ~850 basins, the run time was ~12 seconds (this is just the dissolve operation and excludes overhead like loading the file, subsetting, etc.).
One way to mitigate this performance issue would be to develop a layer of pre-dissolved polygons that represent chunks of upstream watershed. When a dissolve operation needs to be performed on a large watershed, these pre-dissolved polygons can be substituted for n polygons that they represent. This will drastically reduce the number total number of polygons for the dissolve of larger watersheds, thereby increasing performance.
Closure Criteria
The text was updated successfully, but these errors were encountered: