Skip to content

Issue about signal delivery and handling difference between lind and native #447

@qianxichen233

Description

@qianxichen233

I have two concerns about the below test scenario:

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main(void) {
    pid_t pid = fork();

    if (pid == 0) {
        // Child
        sleep(1);
        return 0;
    } else {
        // Parent
        wait(NULL);
        printf("Parent detected child finished.\n");
    }

    pid = fork();

    if (pid == 0) {
        // Child
        sleep(1);
    } else {
        // Parent
        int status = -1;
        wait(&status);
        printf("Child exited with status %d\n", status);
    }

    return 0;
}
  1. Currently in this scenario under line, the second wait could possibly be interrupted by SIGCHLD yielded from the first child exiting. That'e because our epoch checking only happens at the beginning of loops or function headers. This test scenario is a good example of how our approach could possibly make difference between native behavior. As there is no explicit epoch checker between the first and the second wait, it's likely that the second wait is responsible for handling the SIGCHLD signal from first child exiting, causing the wait syscall returns -1. While in native, it is very unlikely this could happen due to its much higher frequency of signal handling, therefore the SIGCHLD is very likely to be handled before the second wait syscall and the program could run expectedly in native.
  2. Another concern is a real issue that we should fix soon. A syscall can only be interrupted by signals that has actual action. If a signal is received but there isn't any action binded to it, then it should not interrupt the syscall (for example, SIGCHLD has no action by default and should therefore not able to interrupt a syscall by default). However, currently in waitpid syscall implementation (and possibly other places where signal interruption is considered), any kind of signal could interrupt the syscall. We should fix this by having an additional check before interrupt the syscall to make sure the signal has an action. (NOTE that although in above test scenario, wait syscall should never be interrupted since SIGCHLD has no action binded, the first issue could still be there if SIGCHLD has an action assigned to it)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions