-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All workers parked when some open buckets are not empty. #792
Comments
I am seeing this as well with sticky immix in Julia (running with Github CI runner). |
Apply the following patch to diff --git a/src/scheduler/scheduler.rs b/src/scheduler/scheduler.rs
index 8dbf8ba39..8e49bcf57 100644
--- a/src/scheduler/scheduler.rs
+++ b/src/scheduler/scheduler.rs
@@ -138,6 +138,11 @@ impl<VM: VMBinding> GCWorkScheduler<VM> {
// Prepare global/collectors/mutators
self.work_buckets[WorkBucketStage::Prepare].add(Prepare::<C>::new(plan));
+ warn!("Prepare added! Sleeping...");
+ let timeout = std::time::Duration::from_millis(100);
+ std::thread::sleep(timeout);
+ warn!("Waken. Oh! I have just added Prepare! Continuing...");
+
// Release global/collectors/mutators
self.work_buckets[WorkBucketStage::Release].add(Release::<C>::new(plan));
@@ -388,6 +393,10 @@ impl<VM: VMBinding> GCWorkScheduler<VM> {
let all_parked = sync.inc_parked_workers();
if all_parked {
+ warn!("Worker {} observed all parked. Sleeping...", worker.ordinal);
+ let timeout = std::time::Duration::from_millis(100);
+ std::thread::sleep(timeout);
+ warn!("Worker {} waken! Oh I observed all parked. Continuing...", worker.ordinal);
// If all workers are parked, enter "group sleeping" and notify controller.
sync.group_sleep = true;
debug!("Entered group-sleeping state"); It doesn't need generational plan. Use the Immix plan to run lusearch is enough. I'll analyse the bug later. |
I think this is what happened. Assume there is only one workers. (Yes. It can be reproduced with
The immediate cause of this problem is that the coordinator received two consecutive "all workers parked" message, despite that the message is carried by a boolean variable And the worker sent the message twice because of the "notify_all" after executing
Step 2 is the culprit.
|
The PR #782 may still have synchronisation issue.
In rare cases, the coordinator may have an assertion failure when all workers parked.
Link: https://github.com/mmtk/mmtk-openjdk/actions/runs/4696973080/jobs/8327558441#step:4:7314
It may be related to generational GC, but I cannot reproduce it locally.
The text was updated successfully, but these errors were encountered: