Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join worker threads on pool shutdown #3794

Merged
merged 3 commits into from
Sep 26, 2023

Conversation

armanbilge
Copy link
Member

This (should) fix an issue that @antoniojimeneznieto and I discovered in armanbilge/fs2-io_uring#78.

We were observing segfaults in the io_uring native calls when e.g. terminating an http4s Ember server via Ctrl-C, crashing the entire JVM.

The problem is that during shutdown of the WorkStealingThreadPool after interrupting each worker thread, it immediately proceeds to close the poller for that thread, even though the thread may still be actively using it. The proposed solution is to join() each thread to await its termination before closing its resources. This should work in happy-paths but there are some caveats in pathological scenarios ... inline comments incoming.

build.sbt Outdated
Test / fork := true,
fork := true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For easily running the IOAppSpec example apps individually.

Comment on lines 704 to 713
while (i < threadCount) {
workerThreads(i).interrupt()
val workerThread = workerThreads(i)
if (workerThread ne currentThread) {
workerThread.interrupt()
workerThread.join()
// wait to stop before closing pollers
}
system.closePoller(pollers(i))
i += 1
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might consider putting a timeout on this join, in case the thread is really stuck (e.g. non-interruptible blocking call on compute pool or deadlock or spinloop). But if we do that, we should probably only close the poller if the thread is known to be terminated, and otherwise just leak it? Better than a segfault.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think a final timeout would be good. About leaking vs. segfault, I'm not sure (see below).

Comment on lines 39 to +40
val run: IO[Unit] =
IO(System.exit(0)).uncancelable
IO.blocking(System.exit(0)).uncancelable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of a deadlock that came up in our test suite. Actually, System.exit(...) is technically blocking, because it blocks waiting for shutdown hooks to complete.

If we run it on the compute pool, then it gets stuck waiting for the runtime to shutdown, which is stuck waiting for this thread to complete, ... deadlock.

@armanbilge
Copy link
Member Author

Can confirm this fixes the segfault. Here's the fixed version, to repro the error remove the explicit CE dependency.

//> using repository https://s01.oss.sonatype.org/content/repositories/releases/
//> using repository https://s01.oss.sonatype.org/content/repositories/snapshots/
//> using dep org.typelevel::cats-effect::3.6-8a37117
//> using dep com.armanbilge::fs2-io_uring::0.3-4e332b9-SNAPSHOT
//> using dep org.http4s::http4s-ember-server::0.23.23

import cats.effect.*
import org.http4s.ember.server.EmberServerBuilder
import fs2.io.uring.net.UringSocketGroup
import fs2.io.uring.unsafe.UringSystem

object App extends IOApp.Simple:
  override protected def pollingSystem = UringSystem
  def run = EmberServerBuilder.default[IO].withSocketGroup(UringSocketGroup[IO]).build.useForever

@durban
Copy link
Contributor

durban commented Sep 5, 2023

It's not entirely clear to me, what is segfaulting here. The JVM itself? Is that a JVM bug?

@armanbilge
Copy link
Member Author

It's not entirely clear to me, what is segfaulting here. The JVM itself? Is that a JVM bug?

Ah sorry, should have clarified. In fs2-io_uring we are calling native code via JNI. That is segfaulting. It's not a JVM bug.

@durban
Copy link
Contributor

durban commented Sep 5, 2023

Hm, okay. So, in my opinion, segfault: very bad, should be very much avoided. Leaking the poller: still bad, but probably less so. Waiting forever for a thread: also pretty bad. Overall, it seems to me that a generous timeout for a join and then leaking the poller (if not joined) would be the least bad solution. (I should actually read the fs2-io_uring code, because I have no idea what leaking a poller actually means. Is it some kind of io_uring data structure allocated by native code?)

However, I'm somewhat ocncerned that it's so easy to get a segfault here. (On the JVM we're generally well protected from these very bad things. Are we giving this up?) But it can be that this is a practically unavoidable consequence of using this Linux API... I should really read that code :-)

@armanbilge
Copy link
Member Author

So, in my opinion, segfault: very bad, should be very much avoided. Leaking the poller: still bad, but probably less so. ... Overall, it seems to me that a generous timeout for a join and then leaking the poller (if not joined) would be the least bad solution.

Yep, agreed. We already have a timeout for shutdown (i.e. how long to wait for cancelation of the main fiber to complete). I wonder if we could re-use that, or it deserves its own configuration. Also since we have to wait for multiple threads, should we keep track of the accumulated shutdown duration and give each thread less and less time to finish up?


I have no idea what leaking a poller actually means. Is it some kind of io_uring data structure allocated by native code

Yes, exactly. This includes data structures both at the user- and kernel-levels (e.g. every io_uring is backed by a file descriptor). Similarly for epoll and kqueue.

So besides leaving some memory uncleared and unfreed, it also means not closing an fd. /shrug


However, I'm somewhat ocncerned that it's so easy to get a segfault here. (On the JVM we're generally well protected from these very bad things. Are we giving this up?)

The JVM also calls native APIs e.g. Selector is using epoll. So it's a fair question to ask, how is the JVM protecting us from this bad thing? The answer is simple: locks and synchronization. By synchronizing on Selector and its methods, we get a guarantee that a Selector won't be closed while somebody is in the middle of using it, and we can check a boolean flag if its open/closed before attempting to actually make that native call that would segfault if its indeed closed.

However, adding this sort of synchronization undoes all the optimizations we are going for with this I/O-integrated runtime 😁 so segfaults come with the territory.

@durban
Copy link
Contributor

durban commented Sep 7, 2023

segfaults come with the territory

This sounds terrifying to me, but I'm not gonna take more of your time until I've actually looked at the code in question... :-)

@armanbilge
Copy link
Member Author

As a concrete example:

You manually allocate some (off-heap) memory when creating a poller. Thus, closing the poller requires freeing that memory.

Then consider a situation where:

  • the thread that owns that poller/memory is writing to that memory e.g. submitting an I/O event
  • in parallel, that thread is being interrupted during runtime shutdown, the poller closed, and the memory freed.

Writing to freed memory can cause a segfault. This is pretty fundamental and the only way to avoid it is to use some form of synchronization to guarantee that memory cannot be freed while it is in use.

@durban
Copy link
Contributor

durban commented Sep 20, 2023

Thank you for the explanation, I understand (more or less) now (I've also looked at the PollingSystem).

I think giving each thread less and less time only makes sense, if you interrupt all of them first, then join them one by one.

Leaking a file descriptor is not ideal, but it's still less wrong than using freed memory or other segfault/undefined behavior things like that.

This is pretty fundamental and the only way to avoid it is to use some form of synchronization to guarantee that memory cannot be freed while it is in use.

Yeah. In fact, this PR does use some form of synchronization: Thread#join. (If I understand correctly, the Pollers are only used by one worker thread + the one doing the close.) Actually, not closing a poller while it is in use should be in the contract of PollingSystem right?

About segfaults being terrible: what I was trying to say is that while this PR fixes this specific segfault, I'm concerned about other similar cases. (Although, now that I've looked at the API, I'm less concerned. The thread-locality of pollers should help.)

@armanbilge
Copy link
Member Author

I think giving each thread less and less time only makes sense, if you interrupt all of them first, then join them one by one.

That sounds right.

Yeah. In fact, this PR does use some form of synchronization: Thread#join.

Exactly my point :)

Actually, not closing a poller while it is in use should be in the contract of PollingSystem right?

Yes, I suppose so. Or more generally: the PollingSystem API makes no promises of thread-safety, it's up to the caller to synchronize where necessary. This is one of those cases.

I'm concerned about other similar cases. (Although, now that I've looked at the API, I'm less concerned. The thread-locality of pollers should help.)

Fair enough, and yeah, we're just going to have to be careful. That's why for now we're only shipping SelectorSystem in CE core and incubating io_uring and stuff externally.

Thanks for taking a deeper look at these issues :)

@djspiewak
Copy link
Member

Merge with head?

@djspiewak djspiewak merged commit e5f5880 into typelevel:series/3.x Sep 26, 2023
29 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants