-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the coordinator (controller) thread #1062
Conversation
Found a case where workers may sleep forever. Need to be fixed later.
Mutators no longer trigger GC by adding ScheduleCollection directly.
I ran The follow plot is normalised to build1: The following plot is not normalised: From the plots, it looks like this PR reduces the STW time if the absolute (not normalised) number of GC is large. There is 1% to 2% STW time reduction for GenCopy, GenImmix, SemiSpace and StickyImmix. For Immix, the STW time is increased by 0.3%, but the error bar is much larger than the difference. From individual data, it looks like some outlier invocations triggered more GC than others. The following plot shows the number of GCs executed in each invocation when running Immix, sorted and grouped by build. It shows the distribution of the number of GCs is the same, but "build2" (yellow) contains some outliers on the right end of the curve. We see this PR makes GC slightly faster, and mainly impacts the plans that trigger GC often (mainly generational plans). I think the main reason is that this PR reduces one context switch every time all workers stopped (because there is no need to switch to the coordinator). If a plan does more GC, but fewer work in each GC (mostly nursery GC), the synchronisation overhead will be significant. But it is hard to explain why SemiSpace becomes faster this way because it is not generational. Maybe SemiSpace simply triggered GC too many times given this heap size. |
The same setting (lusearch, 2.5x min heap size, 40 invocations) ran on skunk.moma. This Coffee Lake machine does not have small ("efficiency") cores, so it is expected to have smaller variance. STW times are reduced for all plans, and Mutator (other) times are increased for all plans.
The total time reflects the actual speed-up. There is a 0.3% slow down for Immix, likely due to the slightly increased number of GC due to random effects. The same data without normalisation: At this heap size, GenCopy and GenImmix spent more time in STW than in mutator (other). |
Here are results from running more benchmarks. 3.0x min heap size, 20 invocations. I split the tests onto two machines (with identical hardware and identical binaries)
From the geometric means of the total time, this PR speeds up all the plans tested, but only by 0.1% to 0.3%. |
I added a Result on Skunk. All the same setting as before except adding
|
I am closing this PR in favor for #1067 (Remove coordinator and support forking). The reason is, after we remove the coordinator thread, we need a way for workers to coordinate themselves. But the concrete way how they coordinate themselves is dictated by the requirement of supporting forking. That is, we must implement a coordinating mechanism that (1) does not need a dedicated coordinator thread, and (2) is easy to bring down all GC workers when forking. The code in this PR accomplished (1), but is obviously not suitable for (2), and #1067 has already done non-trivial refactoring over this PR. Therefore, it is a waste of effort to merge this PR before applying #1067. |
DRAFT: Need polishing. Need performance evaluation. Need to update bindings.
This PR removes the coordinator (a.k.a. controller) thread. GC worker threads now coordinate themselves.
Major changes include:
GCController
is removed. All synchronization mechanisms for the communication between the controller and workers are removed, too.WorkerMonitorSync::should_schedule_gc
is added as the communication channel. To trigger a GC, a mutator shall acquire the mutexWorkerMonitor::sync
, setsync.should_schedule_gc
totrue
, and notify one worker. The last parked worker always checks this variable before blocking.ScheduleCollection
work packet when a GC is requested.EndOfGC
is removed. The code is now executed by the last parked worker when GC finishes.GlobalState::gc_status
is nowAtomic
, and it is read by the last parked worker to decide whether GC is in progress. Previously, it was not used in Rust MMTk, and was only used for assertion in the MMTk in JikesRVMGlobalState::gc_start_time
to record the start time of a GC. It was a local variable inGCController
.WorkBucket::add_no_notify
to add work packets while holding the mutexWorkerMonitor::sync
.Fixes: #1053
Closes: #1061