Skip to content

Blocking Calls for Signals #118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rennergade opened this issue Feb 24, 2025 · 8 comments · May be fixed by #228
Open

Blocking Calls for Signals #118

rennergade opened this issue Feb 24, 2025 · 8 comments · May be fixed by #228
Assignees

Comments

@rennergade
Copy link
Contributor

We'll need to deal with anytime a signal is received by a cage but that cage is in a blocking call. There's various ways we can do this including syscall timeouts.

Some calls I know should block are: read, recv, connect/accept, select/poll, futex...

Let's start out by getting a full list of blocking system calls we currently implement. @ChinmayShringi can you help us generate this.

@JustinCappos
Copy link
Member

Is this on the critical path for what we're aiming at now? Is this a "nice to have" that we will need for compatibility later?

@qianxichen233
Copy link
Contributor

Yes, I believe this is a required feature for getting postgres fully running, at least pgbench is trying to do signal interrupt on select syscall

@rennergade
Copy link
Contributor Author

Yeah this is required to get postgres running in any stable way. Though the fixes for select/poll are most likely both the easiest to fix and the most common case.

@rennergade
Copy link
Contributor Author

Did some quick research on this and wanted to provide some answers to help this along. As far as syscalls we implement I think this is the list of calls that need to be interrupted by signals:

  • wait()/waitpid()
  • sleep()/nanosleep() etc
  • futex()
  • read()/write()
  • accept()/connect()
  • send()/recv()
  • poll()/select()/epoll_wait()

Checking in code:
For the wait() family as well as select/polling we never actually block in the kernel, so we can just insert a check in RustPOSIX if a signal is set and return with EINTR. This should be relatively easy.

Timeouts:
Both sockets (accept, connect, send, recv, read/write on sockets) and futexes implement options for a timeout. I think the easiest way to deal with these (and how we did it in rustposix) is to add a relatively short timeout (100 ms? we'll need to investigate whats optimal here) on all sockets/futexes we create and call the underlying syscall in a loop with a check in between calls until it returns something relevant.

Sleeps
We'll probably have to do something inventive here to make this return early.

Read/write
Since reads and writes can go to a number of subsystems, particularly pipes and sockets this is a bit tricky we'll need to discuss a more complex game plan here but is most likely a mix of timeouts and playing around with O_NONBLOCK.

@JustinCappos
Copy link
Member

JustinCappos commented Feb 26, 2025 via email

@rennergade
Copy link
Contributor Author

rennergade commented Feb 26, 2025

This SO answer about creating an interruptible sleep using a mpsc channel could possibly be our best bet for sleeps. Using kernel sleeps is probably much harder to handle correctly.

@JustinCappos
Copy link
Member

This SO answer about creating an interruptible sleep using a mpsc channel could possibly be our best bet for sleeps. Using kernel sleeps is probably much harder to handle correctly.

Let's also check to see how slow this is (does this delay short sleeps)? I think we should only need this for longer sleep periods.

A lot of this can be pushed until later, unless it is causing an immediate problem.

@rennergade
Copy link
Contributor Author

Had some additional thoughts when talking with Chinmay about futexes. Above I mention that futexes have timeouts and we could use them to check for signal interrupts, but I'm thinking this through and that probably would cause unintended behavior.

I believe it makes the most sense to store futex addresses when futex() is called with FUTEX_WAIT in some sort of list in the cage struct, and when a cage receives a signal to wake those addresses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment