-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix luv signals (issue #400) #412
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want a thread and not a domain? As I understand it, a thread is attached to a domain and can't run while another thread in that domain is running. So if domain 0 is busy doing CPU stuff then no other domains will be able to get signals I think.
I wouldn't really worry about it being busy, signals are already delayed and not supposed to be realtime. Having it as a Domain would break the consecutive ids if people want to store things per-domain somehow, as the first Domain spawned by EIO would be id 2 and not 1. Another issue is that having one an extra idle Domain, even if completely idle, can slow things considerably (see ocaml-multicore/domainslib#77 (comment)) since everyone has to take part in GC, having more Domains are a nono for now. A third and minor point would be keeping things similar to other backends, the systhread is a hack/deviation of the luv backend, but the things we do in a Domain remain the same in luv/uring/whatever, as in they do heavy bulk. |
Thinking about this more, I don't think the busy CPU thing will prevent the signal being handled, because OCaml already has safe points to handle this in the single-domain case. |
After fixing the compile error,
Are we sure that waitpid will only return after the signal is delivered? Also, I don't think you can use a ref cell here. It needs to be I pushed a commit that fixes it for me. Does that look right to you? |
Good point ! Not at all, the signal can arrive, after or before, I did not consider that. Yeah, since the waitpid doesn't hold, a ref cell only would be racy.
Yep 👍 |
This makes sure we can process signals in luv the same way we do in uring. As stated in ocaml-multicore#400 the main issue is that luv's mainloop will restart on EINTR and we will never unwind back to ocaml land, so even though the process got the signal, the runtime will not see it until something else goes on. The trick here is to abuse POSIX thread semantics and shove all signals into one specific systhread by blocking them in the other threads. Additionally, I've fixed a bug in OCaml 5.0 where the systhreads end up starting with all signals blocked: ocaml/ocaml#11880. This PR works even with/without the bug. Danger Zone ~~~~~~~~~~~ This is tricky ! If you're thinking we can do better than pipes, think again ! Unix.sigsuspend doesn't work on multithreaded, we don't have a real Unix.pause (it's implemented on top of sigsuspend !). The semantics for kill(2) to "self" are different than "from outside" when multithreaded, and the runtime doesn't expose pthread_kill(2). That's why on the test we have to fork and signal back to the parent. I know, it's horrible. Co-authored-by: Thomas Leonard <talex5@gmail.com>
CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
This makes sure we can process signals in luv the same way we do in uring. As stated in #400 the main issue is that luv's mainloop will restart on EINTR and we will never unwind back to ocaml land, so even though the process got the signal, the runtime will not see it until something else goes on.
The trick here is to abuse POSIX thread semantics and shove all signals into one specific systhread by blocking them in the other threads.
Additionally, I've fixed a bug in Ocaml 5.0 where the systhreads end up starting with all signals blocked: ocaml/ocaml#11880. This PR works even with/without the bug.
Danger Zone
This is tricky ! If you're thinking we can do better than pipes, think again ! Unix.sigsuspend doesn't work on multithreaded, we don't have a real Unix.pause (it's implemented on top of sigsuspend !).
The semantics for kill(2) to "self" are different than "from outside" when multithreaded, and the runtime doesn't expose pthread_kill(2). That's why on the test we have to fork and signal back to the parent. I know, it's horrible.