Some notes on parallelizing Consequences calculations

Hello @raoanirudh, @micheles et al.,

At @jvanulde’s suggestion, I would like to share our experience/anecdote (?) when trying to parallelize consequence calculations with Canada earthquake scenarios that @tieganh and @jeremyrimando were working on.

While observing a [scripts/run_OQStandard.sh](https://github.com/OpenDRR/earthquake-scenarios/blob/master/scripts/run_OQStandard.sh) run, I noticed that OpenQuake itself would happily use all available CPU cores to do calculations in parallel (which is awesome!) and would finish calculation in about 2 hours, but some other processing are single-threaded and for example, each of the two runs of `python3 scripts/consequences-v3.10.0.py` could take over 12 hours.  See more info at:

- OpenDRR/earthquake-scenarios#57

Since @jeremyrimando needed to do complete ~50 scenario calculations within a month or two, it So, I tried my hands on the Python `multiprocessing` package:

- OpenDRR/earthquake-scenarios#58

    - The actual relevant commit is here: OpenDRR/earthquake-scenarios@f675c8364c7c53b3d417daf78b29277c97e7b5ce
        > Use Python multiprocessing package to take advantage of multiple CPU cores for processing multiple realizations simultaneously.
        > This would reduce the total run time of, for example,
        > ```console
        > bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o
        > ```
        > from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance.

    - Credit: <https://gist.github.com/EdwinChan/3c13d3a746bb3ec5082f> and <https://github.com/tqdm/tqdm/blob/master/examples/parallel_bars.py>

All was good, or so I thought, until we noticed that we were getting mysterious, randomly inconsistent results with multiprocessing enabled, see <https://github.com/OpenDRR/earthquake-scenarios/pull/58#issuecomment-1119005844>.

My guess is that there might be bugs in Python itself or in Numpy’s OpenBLAS dot multiplications which might have caused memory corruption or pollution?  I must admit that I don't know how to test or prove that claim, let alone debugging and resolving it.

Luckily, we came across GNU parallel, which we used to run multiple copies of Python simultaneously, and enjoyed the same reduction in calculation time with no data corruption:

- OpenDRR/earthquake-scenarios#61

    - The gist of that pull request is this:
    
    > ```python
    > run_consequences_in_parallel() {
    >     # Use GNU parallel to run consequences processing in parallel.
    >     local calc_id num_rlzs
    >     calc_id="$1"
    >     num_rlzs=$(python3 -c 'import sys; from openquake.baselib import datastore; calc_id = datastore.get_last_calc_id() if sys.argv[1] == "-1" else int(sys.argv[1]); dstore = datastore.read(calc_id); print(len(dstore["weights"]));' "${calc_id}" 2>/dev/null)
    >     parallel --keep-order --tag --ungroup "python3 scripts/consequences-v3.10.0-one-realization.py \"${calc_id}\" {}" ::: $(seq -s ' ' 0 $((num_rlzs - 1)))
    > }
    > ```

Note that the above quite a while ago, from mid-2022, tested with Python 3.8 and OQ 3.11.

@jvandule commented in May 2022:

> @anthonyfok any value in sharing these findings with @micheles at GEM?

though I must have had missed his message somehow, and/or procrastinated and forgotten somehow, until Sep 2023 when I replied:

> @jvanulde Great idea! Sorry for taking so long to get back to you. I'll try to send an email to Micheles et al. today

And I finally 3½ months later, I am finally submitting this issue here.  Sorry for the delay!

Thanks again for your wonderful work on OpenQuake making earthquake modelling possible!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some notes on parallelizing Consequences calculations #9324

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some notes on parallelizing Consequences calculations #9324

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions