-
Notifications
You must be signed in to change notification settings - Fork 35
Multi-threaded raster strategy #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t error needs confirmation.
…proscribed by the docs.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #161 +/- ##
==========================================
+ Coverage 90.58% 90.63% +0.04%
==========================================
Files 85 85
Lines 6682 6844 +162
Branches 633 645 +12
==========================================
+ Hits 6053 6203 +150
- Misses 597 607 +10
- Partials 32 34 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Thanks for this contribution! I've tested it with some other data locally and am seeing similar performance gains. I have a few questions/notes:
Any details on this? Are you referring to weighted operations, or something else?
Any issues with calling this
I don't expect it to be a major bottleneck, but do you see a reason not to have a
Any chance you're using GEOS < 3.10?
It's essentially handling the locking of the raster dataset to prevent multiple threads from reading at once. But it still may provide benefits here. |
Commit directly is fine! I have it separate to main so I can rebase on top of other changes ready for final merge.
Good call, thank you for spotting! I have followed your advice, erroring out (exception) and removing the other logic as copied from raster-sequential.
I’ll raise a separate issue for this in a day or two.
No I don’t think so. The TBB doco is a bit unclear on this, but using threads is clearer to the user. I’ve made that change.
The change isn’t major so I did it and ran a few tests. On a 8 thread test it makes it take 6% longer and consumes 12.5% more memory at peak. My guess is that the hit comes from:
geos-config reports that it is version 3.12.1. Some other notes I forgot to mention:
|
@shortcutman , I'm very sorry to have disappeared on this for so long. I plan to merge it shortly. 7ba6298 enables parallel I/O with GDAL >= 3.10. It makes a big difference, at least for the benchmark I'm running:
|
@dbaston no problems, other priorities take us all in wacky directions :) That improvement is phenomenal. |
Adds a new strategy to exact extract named
raster-parallel
.The strategy utilises oneAPI TBB to setup a parallel pipeline for finding intersecting features, reading raster data, performing zonal stats and merging stats for final output. Number of 'tokens' (TBB terminology, essentially maximum parallel tasks in flight) is controlled with the
--tokens [number]
command line argument.Implementation
Prior to the parallel pathway, logic is the same as
raster-sequential
where all features are read in and an STR tree is created for doing intersection.For the parallel pipeline:
Finally all features are written out with the same implementation as
raster-sequential
.Parallel considerations
Performance
Performing mean and count on ~1.6M polygons of Western Australia cadastral boundaries against ~25m square pixels of Australian agricultural land use data (with national coverage) done on Ryzen 7700 (8c/16t) with 32GB RAM:
feature-sequential
elapsed time: 7m 42s, maximum memory usage: 2.224 GBraster-sequential
elapsed time: 2m 52s, maximum memory usage: 4.639 GBraster-parallel
with 4 simultaneous tokens, elapsed time: 52s, max memory usage: 5.653 GBraster-parallel
with 8 simultaneous tokens, elapsed time: 37s, max memory usage: 5.823 GBraster-parallel
with 12 simultaneous tokens, elapsed time: 36s, max memory usage: 5.961 GBOther notes
All results were tested against
raster-sequential
outputs to control for any parallel bugs. Nothing major was observed other than occasional floating point errors at the end of its precision. I did not that multiple raster input toraster-sequential
doesn't look like it is working as intended, but out of scope for this PR.I didn't make any changes to the Python bindings or libs at this stage. I note that the actions continually fail on them but not sure why.
Welcome any comments, hope this is something that helps!