Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.posix.accept does not return when interrupted. #19445

Closed
curuvar opened this issue Mar 26, 2024 · 7 comments
Closed

std.posix.accept does not return when interrupted. #19445

curuvar opened this issue Mar 26, 2024 · 7 comments

Comments

@curuvar
Copy link
Contributor

curuvar commented Mar 26, 2024

Zig Version

0.12.0-dev.3352+95cb93944
aarch64 GNU/Linux

Steps to Reproduce and Observed Behavior

I have some code that uses std.net.server.accept to accept connections. If I send the process a SIGTERM (which I handle with my own sigaction) the accept does not return and the process remains blocked. This makes it difficult to gracefully clean things up.

Expected Behavior

the accept function should return an "Interrupted" error.

Other Information

The actual problematic behavior is in the accept function in /std/posix.zig. That function calls the system accept function in a loop whose whole purpose seem to be to cause this behavior.

If this behavior is desired in some cases, perhaps a flag could be added to toggle it on or off.

@curuvar curuvar added the bug Observed behavior contradicts documented or intended behavior label Mar 26, 2024
@rootbeer
Copy link
Contributor

TL;DR: I think this is a duplicate of #6193.

As I understand the current situation, almost all of the Zig POSIXy syscall wrappers retry on EINTR. Not just accept. I think the motivation is that EINTR is very annoying in 99% of code that invokes syscalls, so the hope is to remove that annoyance for Zig programs.

The cost of this approach is that blocking syscalls can become uninterruptible, so Zig programs will have to do something more drastic like close the blocked-on file descriptor (or kill the containing process and let the OS do the cleanup) when they want a blocked thread to give up on an operation. Or use non-blocking APIs. (Or perhaps Zig could support a timeout on blocking syscalls?)

For your use case, does closing the file descriptor work?

Zig could support a flag to say a syscall should be interruptible, but that gets complicated quickly. Its either an extra parameter on every syscall (and every wrapper), or its a global/thread flag that would apply to all syscall invocations (but that would break any invocations not ready for an interruption).

In the short-term, as a work-around I believe you can link against glibc (or musl) to get the C Library's interruptible syscall semantics.

Personally, I'm skeptical of Zig trying to repair broken POSIX semantics like this. So I suspect eventually Zig will be changed to return EINTR to callers (at least in the POSIXy APIs, but maybe not in the higher-level APIs?), but EINTR is a complicated and annoying behavior, so it seems worth a try.

See also #2425.

@marler8997
Copy link
Contributor

Abstractions are never perfect, it's a tradeoff between generality and simplicity. However, in this case the abstraction is opt-in.
Apps always have the option to use the system api's directly and build up their own abstractions when appropriate.

That being said, I don't see much downside to adding an Options struct in Zig's accept abstraction to customize this behavior, and, leave the door open for future options as well.

@GalaxySnail
Copy link

As I understand the current situation, almost all of the Zig POSIXy syscall wrappers retry on EINTR. Not just accept. I think the motivation is that EINTR is very annoying in 99% of code that invokes syscalls, so the hope is to remove that annoyance for Zig programs.

The cost of this approach is that blocking syscalls can become uninterruptible, so Zig programs will have to do something more drastic like close the blocked-on file descriptor (or kill the containing process and let the OS do the cleanup) when they want a blocked thread to give up on an operation. Or use non-blocking APIs. (Or perhaps Zig could support a timeout on blocking syscalls?)

FWIW, Python wrappers also retry on EINTR, but they call the signal handler first, and only retry if the signal handler doesn't raise an exception. For details: PEP 475. I think Zig can do something similar.

@curuvar
Copy link
Contributor Author

curuvar commented Mar 27, 2024

I actually tried closing the stream, but that turns out to be worse. The underlying system call returns with BADF which causes an "unreachable" panic.

As a temporary work around I'm calling std.posix.poll (which also loops on INTR, but does supply a timeout which allow me an escape from the loop) and only do the accept if there is a connection waiting.

I concur this issue is covered by #6193

@rootbeer
Copy link
Contributor

I think the "EBADF causes unreachable" issue is a separate bug. See #6389. Generally patches motivated by real code, as in your case, are acceptable motivations to relax Zig's errno handling assumptions. The EBADF can either be returned or could be mapped to Unexpected. (E.g., see #3880.)

The PEP 475 link is a solid description of the motivations for hiding and handling interruptions. I'd argue something like that belongs in Zig's OS-agnostic infrastructure, and the "posix wrapper" layer shouldn't quite get that involved with dispatching signals, but it really depends on what level of fidelity Zig wants to take for this layer.

@marler8997
Copy link
Contributor

marler8997 commented Mar 28, 2024

I think the "EBADF causes unreachable" issue is a separate bug

In this case this is exposing a bug in the program which is what we want.

I actually tried closing the stream, but that turns out to be worse.

Once you close a file descriptor, that descriptor is no longer valid and attempting to invoke a system call with it is the equivalent to undefined behavior at the OS level. That file descriptor could for example have already been re-used by opening a new file and now you're passing a different file to whatever system call you're invoking that you thought! This means the app now has undefined behavior that's hidden behind a race condition, one of the worst kind of bugs an program can have, and why in this case Zig does whatever it can to help you catch it and discourage its use.

I'm not sure what your code looks like, but, if you're wanting a listen socket to pop, you can call shutdown on it. Or if you want to continue using a signal, you can call std.os.system.accept and handle EINTR yourself.

P.S. it's very difficult to use a signal to handle when to close your listen socket without introducing a race condition. It's very similar to multithreading. You would need to implement some some of synchronization mechanism between the code that uses the listen socket and the signal handler. For example, before you use the listen socket you could set a global "lock_listen_socket" to true and make sure that your signal handler doesn't close it if it's being used, then if the signal handler does execute, add more global state to let the non-signal handling code it's time to close the socket if the signal handler ran while it was being used.

@andrewrk
Copy link
Member

duplicate of #6193

@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Mar 28, 2024
@andrewrk andrewrk removed the bug Observed behavior contradicts documented or intended behavior label Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants