-
Notifications
You must be signed in to change notification settings - Fork 56
Description
I have a code that creates a pool and an ES and pushes a few ULTs to the pool. It then forks. I discovered that POSIX threads (which Argobots ES essentially are) do not survive a fork, so in the child process, the ULTs no longer execute, and I end up hanging on an ABT_xstream_join.
I would like to solve this problem by using pthread_atfork, with a prepare function that destroys the ES (but not the pool nor the ULTs it contains), and a parent and child function that recreates it. Here is what I have tried so far (tried with 1 ULT in the pool, for simplicity):
ABT_xstream_join: will not complete until there aren't any ULTs to run.ABT_xstream_free: will callABT_xstream_joinimplicitly, same problem.ABT_xstream_set_main_sched_basicto set the scheduler to one associated with an empty pool: I'm getting the errorstream.c:1903: xstream_update_main_sched: Assertionp_xstream->ctx.state == ABTD_XSTREAM_CONTEXT_STATE_WAITING' failed.`ABT_xstream_exitcalled from a ULT that is pushed in the pool of the ES: this requests the ES to exit but doesn't actually make it exit until the ULTs in it have completed.- Sending a signal to the ULT to ask it to call
ABT_self_suspend():ABT_xstream_joinstill blocks even though the ULT is now suspended.ABT_pool_get_sizegives 0, butABT_pool_get_total_sizeis still 1. - WORKING SOLUTION: Create a new pool, migrate the ULT to it, join the original xstream, then after fork re-create a new ES and associate it with the new pool.
I might be able to generalize this model by calling ABT_pool_print_all_threads with a function that calls ABT_thread_migrate on each ULT in the pool, though I don't know if it iterates over all the ULTs (including running ones) or only the ones that aren't running.
Ideally, I would want the following functions:
ABT_pool_freeze: mark the pool as frozen, allowing any ES that uses it to be joined, as it the pool was empty (in practice I could make a custom pool implementation that "pretends" to be empty if frozen, but for a cleaner API it would still be useful to be able to report the number of ULTs in the pool)ABT_pool_unfreeze: unfreeze a pool;ABT_pool_is_frozen: check if a pool is frozen.
Alternatively, what could be useful is an ABT_xstream_stop function that joins the underlying pthread even if the ES's pool isn't empty, combined with ABT_xstream_revive (which already exists) to spin it back up.