Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread local fallback weak bag #2844

Merged
merged 7 commits into from
Feb 27, 2022
Merged

Thread local fallback weak bag #2844

merged 7 commits into from
Feb 27, 2022

Conversation

vasilmkd
Copy link
Member

Somewhat addresses #2663.

@armanbilge
Copy link
Member

I know benchmarks are incoming 😉 but does this have the same basic caveat as #2508 (comment)?

ThreadLocal is very expensive, and often exceeds the cost of contention, which is why you don't see it used too often. It also can have some complex GC implications which have their own performance costs.

@vasilmkd
Copy link
Member Author

I wanted to update the original comment, but I'll answer here.

Benchmarks are not coming because I'm not sure what to measure exactly.

The code that this replaces is literally a ThreadLocalRandom, which is, you guessed it, a ThreadLocal. And in #2663, it was the locking contention that showed up, instead of the thread local usage.

I would like to call @yanns into the conversation if they would be willing to test this change out, with their workflow and measurement in Mission Control. Thanks in advance.

@vasilmkd
Copy link
Member Author

vasilmkd commented Feb 27, 2022

Managed to come up with a benchmark.

series/3.3.x:

Benchmark                        (size)   Mode  Cnt    Score   Error  Units
ThreadLocalBenchmark.contention    2000  thrpt   20  287.505 ± 3.640  ops/s

This PR:

Benchmark                        (size)   Mode  Cnt    Score    Error  Units
ThreadLocalBenchmark.contention    2000  thrpt   20  306.480 ± 22.891  ops/s

In reality, the improvement is a bit misleading due to the bigger margin of error, so it might be a toss up, but thread locals are not strictly slower I guess.

@vasilmkd
Copy link
Member Author

Another run is more or less the same:

This PR:

Benchmark                        (size)   Mode  Cnt    Score    Error  Units
ThreadLocalBenchmark.contention    2000  thrpt   20  312.366 ± 30.404  ops/s

@vasilmkd vasilmkd marked this pull request as ready for review February 27, 2022 00:48
@durban
Copy link
Contributor

durban commented Feb 27, 2022

It's unclear to me, what guarantees that the bag.toSet call in def foreignFibers() "sees" the fibers inserted by monitorFallback, as monitorFallback accesses a bag directly, without any synchronization (and WeakBag itself does not seem thread-safe).

@vasilmkd
Copy link
Member Author

That's great input @durban. Thank you.

Another benchmark run with the latest changes:
This PR:

Benchmark                        (size)   Mode  Cnt    Score    Error  Units
ThreadLocalBenchmark.contention    2000  thrpt   20  327.031 ± 20.432  ops/s

- If certain thread pools or executors cycle their threads, keeping weak
references to each bag lets those bags be eligible for GC when their
associated thread exits
@djspiewak djspiewak merged commit 90b0205 into typelevel:series/3.3.x Feb 27, 2022
@vasilmkd vasilmkd deleted the thread-local-weak-bag branch February 27, 2022 18:27
private[FiberMonitor] final val Bags: ThreadLocal[WeakBag[IOFiber[_]]] =
ThreadLocal.withInitial { () =>
val bag = new WeakBag[IOFiber[_]]()
BagReferences.offer(new WeakReference(bag))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasilmkd sorry, I had a follow-up question about this change.

Is it possible that we could lose track of suspended fibers, if the threads that they were suspended from no longer exist? Is that even a realistic situation 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a possibility, yes. The change was made with the intention that having an already inaccurate reporting mechanism remain that way is better than a memory leak. If people disagree, PRs are welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, thanks.

Probably over-complicated but I wonder if we could use a PhantomReference to "evacuate" the contents of the bag when its owning thread gets GCed.

Btw, since the WSTP also dynamically adds/removes threads, how is this problem handled there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WSTP does not use this code path. I'm open to exploring Phantom References.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WSTP does not use this code path.

Right :) but it still uses a thread-local fiber bag right? And the threads may be added/removed as the WSTP resizes itself? So it seems like it's a very similar problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't sure if it's worth it :) instead of a dedicated thread, is this something we can schedule on the runtime itself?

Copy link
Member Author

@vasilmkd vasilmkd Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what we had before. It requires solving mapping of threads to bags, which was done using locking. If we come up with a concurrent weak bag/hash map, then sure. But not even JCTools has that afaik. It's a big undertaking.

Edit: I misunderstood your comment and answered something completely different.

Copy link
Member Author

@vasilmkd vasilmkd Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduling on the runtime requires answering how often do to run it, which to me doesn't seem like a good strategy for something considered to be memory beneficial/critical. And ReferenceQueue is not too smart of an interface either. You can poll it in a non-blocking way, and when it returns null, when do you try again? The proper way IMO is to block on it and run cleanup on each expiry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't seem very elegant :) I feel like in practice, there must be some reasonable rate at which we can check the ReferenceQueue ... if an application is adding/removing threads too fast seems like its performance would be bounded by other factors anyway. But I don't really know about such things :)

After thinking about this more, seems like it could be important. A situation in which there is a deadlock seems like exactly the situation when a dynamically resizing threadpool would start culling threads due to lack of work, which could cause GC of the fiber bag holding the fibers would help diagnose the deadlock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djspiewak 👆🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants