Skip to content

Is there a safe way to make UCX work in the child process? #10748

Open
@Swiftie13st

Description

@Swiftie13st

Hello, I’m currently using UCX 1.81.1. My application loads data via a child process via fork() midway through, at which point ucx has already been called multiple times, then the program hangs after forking even with env UCX_IB_FORK_INIT=y enabled. Reinitializing UCX didn’t resolve the issue #4325
After modifying the example code, the problem can be reproduced, the UCX in the child process does not work properly.
Is there a safe way to make UCX work in the child process? (using RDMA IB)

Here is the code modified based on /examples/ucp_client_server.c +1130 :

/* Client-Server initialization */
if (server_addr == NULL) {
    /* Server side */
    ret = run_server(ucp_context, ucp_worker, listen_addr, send_recv_type);
} else {
    /* Client side */
    ret = run_client(ucp_context, server_addr, send_recv_type);

    pid = fork();
    if (pid == 0) {
        // not reinit -> Caught signal 11 (Segmentation fault: address not mapped to object at address 0x55e892795000)
        ret = init_context(&ucp_context1, &ucp_worker1, send_recv_type);
      if (ret != 0) {
            goto err;
        }
        printf("%p, %p", (void *)ucp_context,  (void *)ucp_context1);

        // but this will hang
        ret = run_client(ucp_worker1, server_addr, send_recv_type);
    } else {
        waitpid(pid, NULL, 0);
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions