Skip to content

Trying to destroy an ES that still has ULTs to run #403

@mdorier

Description

@mdorier

I have a code that creates a pool and an ES and pushes a few ULTs to the pool. It then forks. I discovered that POSIX threads (which Argobots ES essentially are) do not survive a fork, so in the child process, the ULTs no longer execute, and I end up hanging on an ABT_xstream_join.

I would like to solve this problem by using pthread_atfork, with a prepare function that destroys the ES (but not the pool nor the ULTs it contains), and a parent and child function that recreates it. Here is what I have tried so far (tried with 1 ULT in the pool, for simplicity):

  • ABT_xstream_join: will not complete until there aren't any ULTs to run.
  • ABT_xstream_free: will call ABT_xstream_join implicitly, same problem.
  • ABT_xstream_set_main_sched_basic to set the scheduler to one associated with an empty pool: I'm getting the error stream.c:1903: xstream_update_main_sched: Assertion p_xstream->ctx.state == ABTD_XSTREAM_CONTEXT_STATE_WAITING' failed.`
  • ABT_xstream_exit called from a ULT that is pushed in the pool of the ES: this requests the ES to exit but doesn't actually make it exit until the ULTs in it have completed.
  • Sending a signal to the ULT to ask it to call ABT_self_suspend(): ABT_xstream_join still blocks even though the ULT is now suspended. ABT_pool_get_size gives 0, but ABT_pool_get_total_size is still 1.
  • WORKING SOLUTION: Create a new pool, migrate the ULT to it, join the original xstream, then after fork re-create a new ES and associate it with the new pool.

I might be able to generalize this model by calling ABT_pool_print_all_threads with a function that calls ABT_thread_migrate on each ULT in the pool, though I don't know if it iterates over all the ULTs (including running ones) or only the ones that aren't running.

Ideally, I would want the following functions:

  • ABT_pool_freeze: mark the pool as frozen, allowing any ES that uses it to be joined, as it the pool was empty (in practice I could make a custom pool implementation that "pretends" to be empty if frozen, but for a cleaner API it would still be useful to be able to report the number of ULTs in the pool)
  • ABT_pool_unfreeze: unfreeze a pool;
  • ABT_pool_is_frozen: check if a pool is frozen.

Alternatively, what could be useful is an ABT_xstream_stop function that joins the underlying pthread even if the ES's pool isn't empty, combined with ABT_xstream_revive (which already exists) to spin it back up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions