use concurrent.futures.Executor instead of multiprocessing pool to resolve conflict with duet#7938
use concurrent.futures.Executor instead of multiprocessing pool to resolve conflict with duet#7938NoureldinYosri wants to merge 6 commits intoquantumlib:mainfrom
Conversation
…solve conflict with duet
eliottrosenberg
left a comment
There was a problem hiding this comment.
Fixes b/490175992 (discussed in quantumlib/ReCirq#461 (review))
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7938 +/- ##
=======================================
Coverage 99.63% 99.63%
=======================================
Files 1108 1108
Lines 99571 99587 +16
=======================================
+ Hits 99205 99226 +21
+ Misses 366 361 -5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Please check if the new ThreadPoolExecutor default is faster than a serial run. If it makes no difference or is worse, consider switching to a serial evaluation by default.
Also it may be worthwhile to check if creating a local multiprocessing.Pool with a spawn instead of fork start method would help with warnings in the bug.
Ref: https://docs.python.org/3.11/library/multiprocessing.html#multiprocessing.get_context
|
|
||
| from __future__ import annotations | ||
|
|
||
| import concurrent.futures as cf |
There was a problem hiding this comment.
Nit - can we use the futures module name here for less indirection - like in
| pool = num_workers_or_pool # pragma: no cover | ||
| elif num_workers_or_pool != 0: | ||
| pool = multiprocessing.Pool(num_workers_or_pool if num_workers_or_pool > 0 else None) | ||
| pool = cf.ThreadPoolExecutor(num_workers_or_pool if num_workers_or_pool > 0 else None) |
There was a problem hiding this comment.
ThreadPoolExecutor is subject to GIL. Unless the mapped function spends a lot of time in numpy calls or waiting for IO, the execution would be the same as in a serial call or worse due to thread-switching overhead.
I made a quick test with a many-term sums computed in series or in parallel with multiprocessing.Pool.map vs ThreadPoolExecutor.map. The ThreadPoolExecutor took about 2.5 times longer than a serial evaluation.
example timing code
def partial_sum(start_end: tuple[int, int]) -> float:
total = 0
for i in range(*start_end, 3):
total += (-1) ** i * 1.0 / i
return total
def tedious_sum(terms_count: int, mapfunc) -> float:
total = sum(mapfunc(partial_sum, ((start, terms_count) for start in (1, 2, 3))))
return total
# %timeit tedious_sum(10_000_000, map)
# 2.93 s ± 51.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# pool = multiprocessing.Pool(3)
# %timeit tedious_sum(10_000_000, pool.map)
# 1.01 s ± 25.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# tpx = concurrent.futures.ThreadPoolExecutor(3)
# %timeit tedious_sum(10_000_000, tpx.map)
# 8.46 s ± 1.11 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Can you make a quick comparison of the z_phase_calibration_workflow duration with ThreadPoolExecutor compared to a serial run?
If comparable I'd suggest to make it a default to do a serial evaluation.
| random_state: cirq.RANDOM_STATE_OR_SEED_LIKE = None, | ||
| atol: float = 1e-3, | ||
| num_workers_or_pool: int | multiprocessing.pool.Pool = -1, | ||
| num_workers_or_pool: int | multiprocessing.pool.Pool | cf.Executor = -1, |
There was a problem hiding this comment.
AFAICT, the code later needs only the Pool.map or Executor.map functions.
Would it be possible to change this to accept either an int for a number of workers or a parallel-map function?
| assert isinstance(pool, cf.Executor) | ||
| pool.shutdown() |
There was a problem hiding this comment.
Consider wrapping the local pool in a contextlib.ExitStack context instead.
I think there is a risk otherwise for the pool to stay around if function aborts on exception.
Fixes b/490175992