Skip to content

Multi-threaded raster strategy #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Apr 28, 2025
Merged

Conversation

shortcutman
Copy link
Contributor

Adds a new strategy to exact extract named raster-parallel.

The strategy utilises oneAPI TBB to setup a parallel pipeline for finding intersecting features, reading raster data, performing zonal stats and merging stats for final output. Number of 'tokens' (TBB terminology, essentially maximum parallel tasks in flight) is controlled with the --tokens [number] command line argument.

Implementation

Prior to the parallel pathway, logic is the same as raster-sequential where all features are read in and an STR tree is created for doing intersection.

For the parallel pipeline:

  • A context per a subdivision of each raster utilised in an operation is created. Context starts with the subdivision and pointer to source Raster. This is a serial stage (as required for the first stage of a parallel pipeline in TBB).
  • For each of the above, performs query against the STR tree to find all features that intersect with the subdivision. This is a parallel stage for each context.
  • For each context that has intersecting features, reads the raster data in. This is a serial stage.
  • Performs zonal stat calculations on features and the raster data writing the results into its own StatsRegistry. This is a parallel stage for each context.
  • Merges context provided StatRegistry data into Processor owned StatRegistry. This is a serial stage.

Finally all features are written out with the same implementation as raster-sequential.

Parallel considerations

  • All raster read is done sequential but not necessarily the same thread. In testing this hasn't created any observed issues.
  • Tree intersection and zonal stats calculations require their own GEOS contexts in order to be parallelised. Incorrect stats were observed before this was fixed.
  • I haven't yet looked at the new GDAL raster read thread safety API and what other improvements that may provide.

Performance

Performing mean and count on ~1.6M polygons of Western Australia cadastral boundaries against ~25m square pixels of Australian agricultural land use data (with national coverage) done on Ryzen 7700 (8c/16t) with 32GB RAM:

  • feature-sequential elapsed time: 7m 42s, maximum memory usage: 2.224 GB
  • raster-sequential elapsed time: 2m 52s, maximum memory usage: 4.639 GB
  • raster-parallel with 4 simultaneous tokens, elapsed time: 52s, max memory usage: 5.653 GB
  • raster-parallel with 8 simultaneous tokens, elapsed time: 37s, max memory usage: 5.823 GB
  • raster-parallel with 12 simultaneous tokens, elapsed time: 36s, max memory usage: 5.961 GB

Other notes

All results were tested against raster-sequential outputs to control for any parallel bugs. Nothing major was observed other than occasional floating point errors at the end of its precision. I did not that multiple raster input to raster-sequential doesn't look like it is working as intended, but out of scope for this PR.

I didn't make any changes to the Python bindings or libs at this stage. I note that the actions continually fail on them but not sure why.

Welcome any comments, hope this is something that helps!

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 91.81287% with 14 lines in your changes missing coverage. Please review.

Project coverage is 90.63%. Comparing base (66f2b7a) to head (a534870).
Report is 37 commits behind head on master.

Files with missing lines Patch % Lines
src/raster_stats.h 85.10% 5 Missing and 2 partials ⚠️
src/gdal_raster_wrapper.cpp 72.72% 3 Missing ⚠️
src/exactextract.cpp 50.00% 2 Missing ⚠️
src/raster_source.h 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #161      +/-   ##
==========================================
+ Coverage   90.58%   90.63%   +0.04%     
==========================================
  Files          85       85              
  Lines        6682     6844     +162     
  Branches      633      645      +12     
==========================================
+ Hits         6053     6203     +150     
- Misses        597      607      +10     
- Partials       32       34       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dbaston
Copy link
Member

dbaston commented Jan 21, 2025

Thanks for this contribution! I've tested it with some other data locally and am seeing similar performance gains. I have a few questions/notes:

  • Should I open a PR against your fork to add some tests, etc? Or commit directly to your branch?

  • It looks like this doesn't handle weighted calculations, at least not in a thread-safe way. For now I think it makes sense to simply error out if a user requests weighted stats. Do you agree?

I did not that multiple raster input to raster-sequential doesn't look like it is working as intended, but out of scope for this PR.

Any details on this? Are you referring to weighted operations, or something else?

Number of 'tokens' (TBB terminology, essentially maximum parallel tasks in flight) is controlled with the --tokens [number] command line argument.

Any issues with calling this --threads ?

Performs zonal stat calculations on features and the raster data writing the results into its own StatsRegistry.

I don't expect it to be a major bottleneck, but do you see a reason not to have a StatsRegistry for each thread, instead of for each work unit?

Tree intersection and zonal stats calculations require their own GEOS contexts in order to be parallelised. Incorrect stats were observed before this was fixed.

Any chance you're using GEOS < 3.10?

I haven't yet looked at the new GDAL raster read thread safety API and what other improvements that may provide.

It's essentially handling the locking of the raster dataset to prevent multiple threads from reading at once. But it still may provide benefits here.

@shortcutman
Copy link
Contributor Author

Should I open a PR against your fork to add some tests, etc? Or commit directly to your branch?

Commit directly is fine! I have it separate to main so I can rebase on top of other changes ready for final merge.

It looks like this doesn't handle weighted calculations, at least not in a thread-safe way. For now I think it makes sense to simply error out if a user requests weighted stats. Do you agree?

Good call, thank you for spotting! I have followed your advice, erroring out (exception) and removing the other logic as copied from raster-sequential.

Any details on this? Are you referring to weighted operations, or something else?

I’ll raise a separate issue for this in a day or two.

Any issues with calling this --threads ?

No I don’t think so. The TBB doco is a bit unclear on this, but using threads is clearer to the user. I’ve made that change.

I don't expect it to be a major bottleneck, but do you see a reason not to have a StatsRegistry for each thread, instead of for each work unit?

The change isn’t major so I did it and ran a few tests. On a 8 thread test it makes it take 6% longer and consumes 12.5% more memory at peak. My guess is that the hit comes from:

  • locking on the thread local storage compared to no locks at all,
  • not doing merge in parallel with other steps,
  • multiple StatsRegistries around for longer compared to in parallel merging the data.
    I’m not 100% sure though and didn’t dive into it too much more than the test runs. The changes I made are here.

Any chance you're using GEOS < 3.10?

geos-config reports that it is version 3.12.1.

Some other notes I forgot to mention:

  • I disabled clang-format on the TBB pipeline code. I have written it using lambdas but that can be unpicked if desired.
  • I did not implement any combination logic for variance based operations. I unfortunately don’t know anything about them and what the best approach to take was.

@dbaston
Copy link
Member

dbaston commented Apr 27, 2025

@shortcutman , I'm very sorry to have disappeared on this for so long. I plan to merge it shortly.

7ba6298 enables parallel I/O with GDAL >= 3.10. It makes a big difference, at least for the benchmark I'm running:

  • raster-sequential: 42.7 s
  • raster-parallel (12 threads): 21.8 s
  • raster-parallel with parallel I/O (12 threads): 5.8 s

@shortcutman
Copy link
Contributor Author

@dbaston no problems, other priorities take us all in wacky directions :) That improvement is phenomenal.

@dbaston dbaston merged commit cd65e2d into isciences:master Apr 28, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants