Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix luv signals (issue #400) #412

Merged
merged 1 commit into from
Jan 24, 2023
Merged

Conversation

haesbaert
Copy link
Contributor

@haesbaert haesbaert commented Jan 23, 2023

This makes sure we can process signals in luv the same way we do in uring. As stated in #400 the main issue is that luv's mainloop will restart on EINTR and we will never unwind back to ocaml land, so even though the process got the signal, the runtime will not see it until something else goes on.

The trick here is to abuse POSIX thread semantics and shove all signals into one specific systhread by blocking them in the other threads.

Additionally, I've fixed a bug in Ocaml 5.0 where the systhreads end up starting with all signals blocked: ocaml/ocaml#11880. This PR works even with/without the bug.

Danger Zone

This is tricky ! If you're thinking we can do better than pipes, think again ! Unix.sigsuspend doesn't work on multithreaded, we don't have a real Unix.pause (it's implemented on top of sigsuspend !).
The semantics for kill(2) to "self" are different than "from outside" when multithreaded, and the runtime doesn't expose pthread_kill(2). That's why on the test we have to fork and signal back to the parent. I know, it's horrible.

@haesbaert haesbaert requested a review from talex5 January 23, 2023 17:17
Copy link
Collaborator

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want a thread and not a domain? As I understand it, a thread is attached to a domain and can't run while another thread in that domain is running. So if domain 0 is busy doing CPU stuff then no other domains will be able to get signals I think.

lib_eio_luv/eio_luv.ml Outdated Show resolved Hide resolved
@haesbaert
Copy link
Contributor Author

haesbaert commented Jan 24, 2023

Do we really want a thread and not a domain? As I understand it, a thread is attached to a domain and can't run while another thread in that domain is running. So if domain 0 is busy doing CPU stuff then no other domains will be able to get signals I think.

I wouldn't really worry about it being busy, signals are already delayed and not supposed to be realtime.

Having it as a Domain would break the consecutive ids if people want to store things per-domain somehow, as the first Domain spawned by EIO would be id 2 and not 1.

Another issue is that having one an extra idle Domain, even if completely idle, can slow things considerably (see ocaml-multicore/domainslib#77 (comment)) since everyone has to take part in GC, having more Domains are a nono for now.

A third and minor point would be keeping things similar to other backends, the systhread is a hack/deviation of the luv backend, but the things we do in a Domain remain the same in luv/uring/whatever, as in they do heavy bulk.

@talex5
Copy link
Collaborator

talex5 commented Jan 24, 2023

Thinking about this more, I don't think the busy CPU thing will prevent the signal being handled, because OCaml already has safe points to handle this in the single-domain case.

@talex5
Copy link
Collaborator

talex5 commented Jan 24, 2023

After fixing the compile error, make test_luv failed:

rm -rf _build
EIO_BACKEND=luv dune runtest
File "tests/dune", line 1, characters 0-54:
1 | (mdx
2 |   (package eio_main)
3 |   (deps (package eio_main)))
Command got signal INT.

Are we sure that waitpid will only return after the signal is delivered? Also, I don't think you can use a ref cell here. It needs to be Atomic.t.

I pushed a commit that fixes it for me. Does that look right to you?

@haesbaert
Copy link
Contributor Author

After fixing the compile error, make test_luv failed:

rm -rf _build
EIO_BACKEND=luv dune runtest
File "tests/dune", line 1, characters 0-54:
1 | (mdx
2 |   (package eio_main)
3 |   (deps (package eio_main)))
Command got signal INT.

Are we sure that waitpid will only return after the signal is delivered? Also, I don't think you can use a ref cell here. It needs to be Atomic.t.

Good point ! Not at all, the signal can arrive, after or before, I did not consider that.
I must have been (un)lucky, I ran make test_luv in a loop for an hour and never triggered it.

Yeah, since the waitpid doesn't hold, a ref cell only would be racy.

I pushed a commit that fixes it for me. Does that look right to you?

Yep 👍

This makes sure we can process signals in luv the same way we do in uring. As
stated in ocaml-multicore#400 the main issue is that luv's mainloop will restart on EINTR and
we will never unwind back to ocaml land, so even though the process got the
signal, the runtime will not see it until something else goes on.

The trick here is to abuse POSIX thread semantics and shove all signals into one
specific systhread by blocking them in the other threads.

Additionally, I've fixed a bug in OCaml 5.0 where the systhreads end up starting
with all signals blocked: ocaml/ocaml#11880. This PR
works even with/without the bug.

Danger Zone
~~~~~~~~~~~
This is tricky ! If you're thinking we can do better than pipes, think again !
Unix.sigsuspend doesn't work on multithreaded, we don't have a real Unix.pause
(it's implemented on top of sigsuspend !).
The semantics for kill(2) to "self" are different than "from outside" when
multithreaded, and the runtime doesn't expose pthread_kill(2). That's why on the
test we have to fork and signal back to the parent. I know, it's horrible.

Co-authored-by: Thomas Leonard <talex5@gmail.com>
@talex5 talex5 merged commit c634c4f into ocaml-multicore:main Jan 24, 2023
talex5 added a commit to talex5/opam-repository that referenced this pull request Feb 1, 2023
CHANGES:

New features:

- Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408).
  Runs an accept loop in one or more domains, with cancellation and graceful shutdown,
  and an optional maximum number of concurrent connections.

- Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399).
  Parse numbers in various binary formats.

- Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418).

Performance:

- Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381).
  In addition to being faster, this allows using conditions in signal handlers.

- Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398).

- Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411).

- Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401).

Bug fixes:

- eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428).
  Previously, we could fail to submit a job promptly because the SQE queue was full.

- Fix luv signals (@haesbaert ocaml-multicore/eio#412).
  `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run.

- eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421).

- eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb).

- Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418).

Documentation:

- Add example programs (@talex5 ocaml-multicore/eio#389).

- Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417).

- Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394).

- Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395).

- Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426).

- Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393).

Other changes:

- Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420).

- Remove debug-level logging (@talex5 ocaml-multicore/eio#403).

- eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414).

- Update to Dune 3 (@talex5 ocaml-multicore/eio#410).

- Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404).

- Simplify cancellation logic (@talex5 ocaml-multicore/eio#396).

- time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
talex5 added a commit to talex5/opam-repository that referenced this pull request Feb 1, 2023
CHANGES:

New features:

- Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408).
  Runs an accept loop in one or more domains, with cancellation and graceful shutdown,
  and an optional maximum number of concurrent connections.

- Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399).
  Parse numbers in various binary formats.

- Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418).

Performance:

- Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381).
  In addition to being faster, this allows using conditions in signal handlers.

- Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398).

- Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411).

- Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401).

Bug fixes:

- eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428).
  Previously, we could fail to submit a job promptly because the SQE queue was full.

- Fix luv signals (@haesbaert ocaml-multicore/eio#412).
  `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run.

- eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421).

- eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb).

- Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418).

Documentation:

- Add example programs (@talex5 ocaml-multicore/eio#389).

- Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417).

- Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394).

- Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395).

- Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426).

- Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393).

Other changes:

- Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420).

- Remove debug-level logging (@talex5 ocaml-multicore/eio#403).

- eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414).

- Update to Dune 3 (@talex5 ocaml-multicore/eio#410).

- Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404).

- Simplify cancellation logic (@talex5 ocaml-multicore/eio#396).

- time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
talex5 added a commit to talex5/opam-repository that referenced this pull request Feb 1, 2023
CHANGES:

New features:

- Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408).
  Runs an accept loop in one or more domains, with cancellation and graceful shutdown,
  and an optional maximum number of concurrent connections.

- Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399).
  Parse numbers in various binary formats.

- Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418).

Performance:

- Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381).
  In addition to being faster, this allows using conditions in signal handlers.

- Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398).

- Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411).

- Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401).

Bug fixes:

- eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428).
  Previously, we could fail to submit a job promptly because the SQE queue was full.

- Fix luv signals (@haesbaert ocaml-multicore/eio#412).
  `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run.

- eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421).

- eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb).

- Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418).

Documentation:

- Add example programs (@talex5 ocaml-multicore/eio#389).

- Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417).

- Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394).

- Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395).

- Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426).

- Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393).

Other changes:

- Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420).

- Remove debug-level logging (@talex5 ocaml-multicore/eio#403).

- eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414).

- Update to Dune 3 (@talex5 ocaml-multicore/eio#410).

- Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404).

- Simplify cancellation logic (@talex5 ocaml-multicore/eio#396).

- time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
talex5 added a commit to talex5/opam-repository that referenced this pull request Feb 1, 2023
CHANGES:

New features:

- Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408).
  Runs an accept loop in one or more domains, with cancellation and graceful shutdown,
  and an optional maximum number of concurrent connections.

- Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399).
  Parse numbers in various binary formats.

- Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418).

Performance:

- Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381).
  In addition to being faster, this allows using conditions in signal handlers.

- Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398).

- Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411).

- Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401).

Bug fixes:

- eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428).
  Previously, we could fail to submit a job promptly because the SQE queue was full.

- Fix luv signals (@haesbaert ocaml-multicore/eio#412).
  `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run.

- eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421).

- eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb).

- Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418).

Documentation:

- Add example programs (@talex5 ocaml-multicore/eio#389).

- Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417).

- Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394).

- Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395).

- Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426).

- Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393).

Other changes:

- Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420).

- Remove debug-level logging (@talex5 ocaml-multicore/eio#403).

- eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414).

- Update to Dune 3 (@talex5 ocaml-multicore/eio#410).

- Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404).

- Simplify cancellation logic (@talex5 ocaml-multicore/eio#396).

- time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants