Improve yield handling and add yield-based livelock detection. #1651

JCTyblaidd · 2020-12-15T15:22:18Z

This improves on #1388, however it requires that programs are annotated with yields, either spin_loop_hint or thread_yield in order to work. Additional configuration with a liveness guarantee might be worth adding later to fully solve the issue if yield statements are not present.

This functions by using yields to dynamically generate watch sets of concurrent objects and atomic memory locations that must change in order for the yielding thread to make progress; and blocking on the state updating. A configurable but limited number of spurious wake events from yields are allowed for cases where the yield loops have a finite number of iterations and a fallback that must be explored or are present as optimizations and not in a blocking loop. A thread performing any concurrently visible state change resets the counter.

The current default value of the yield iterations is 1, (0 is infinite), I am unsure if we should leave this as the default or change it.

… x86.

JCTyblaidd · 2020-12-15T16:37:26Z

Ok, i think i have narrowed the ci run-forever down to the mips architecture in macos, since spin_loop_hint only calls the yield code in miri on x86, this should ideally be changed on the rustc side to call the new miri exported shim under cfg(miri).

@RalfJung - the first two ci instances need to be manually terminated since they don't seem to auto-timeout.

RalfJung · 2020-12-19T13:12:20Z

Wow, there's a lot going on here! Thanks a ton.

However, unfortunately I don't really understand what is happening. Is there a high-level description of the problem this is solving, and how it is being solved? I am somewhat worried about adding all this code that I will have no way of maintaining since I have no clue what it does.^^

it requires that programs are annotated with yields, either spin_loop_hint or thread_yield in order to work

This seems pretty similar to the status quo? We have to add yields to avoid spinning in a single loop forever?

The introduction makes it sound like this is a fix to the scheduler, to avoid scheduling a spin loop forever. Later it sounds more like this is an extension of deadlock detection. Which of the two is true? Are there any programs that looped forever before, that now execute properly?

This functions by using yields to dynamically generate watch sets of concurrent objects and atomic memory locations that must change in order for the yielding thread to make progress; and blocking on the state updating.

This is all very vague... how exactly do yields define a watch set? Does the location really have to change or is a non-changing store enough? What about acquire reads that also change the local state?

There are many ways to define "livelock" and "progress"; how exactly do you define these notions here?

RalfJung · 2020-12-19T13:14:16Z

tests/run-pass/concurrency/yield_schedule.rs

+
+            // Note: should change to use this once cfg(miri) to use intrinsic.
+            //std::sync::atomic::spin_loop_hint();
+            unsafe { miri_yield_thread(); }


What about using std::thread::yield_now?

It would work but I wold ideally like to replace this with spin_loop_hint once it works correctly.

Sure, but then we can avoid adding miri_yield_thread in the first place.^^

JCTyblaidd · 2020-12-19T16:59:22Z

Wow, there's a lot going on here! Thanks a ton.

However, unfortunately I don't really understand what is happening. Is there a high-level description of the problem this is solving, and how it is being solved? I am somewhat worried about adding all this code that I will have no way of maintaining since I have no clue what it does.^^

The main motivation for this change is for model checking, at the cost of requiring the programmer annotate all forms of spin-loop or similar structures with a yield operation, this ensures that the loop is only executed once and then further loop executions only occur to check for progress if the values the loop checked may have changed.

The code dynamically generates what the RCMC paper describes as assume statements.

it requires that programs are annotated with yields, either spin_loop_hint or thread_yield in order to work

This seems pretty similar to the status quo? We have to add yields to avoid spinning in a single loop forever?

The introduction makes it sound like this is a fix to the scheduler, to avoid scheduling a spin loop forever. Later it sounds more like this is an extension of deadlock detection. Which of the two is true? Are there any programs that looped forever before, that now execute properly?

This now fixes some programs with multiple waiting spin loops, given yield annotations where the old yield implementation would eternally bounce between the two threads and never terminate.

This functions by using yields to dynamically generate watch sets of concurrent objects and atomic memory locations that must change in order for the yielding thread to make progress; and blocking on the state updating.

This is all very vague... how exactly do yields define a watch set? Does the location really have to change or is a non-changing store enough? What about acquire reads that also change the local state?

Currently any change is detected even if it changes the value to the same contents, it could be refined to only consider changes, are we ever considering load-link, store-conditional intrinsics for the architectures that support them?

Acquire reads or fences that change the local happens-before vector clock are not considered, this is a good point and would probably need to be considered once weak memory support is added since it could change the set of values that could be read from, I need to think about this.

There are many ways to define "livelock" and "progress"; how exactly do you define these notions here?

A thread is considered livelocked or blocked on yield if a thread executes a yield a given number of times and changes no state visible to other threads. A program is considered livelocked if all threads are either blocked on a sync primitive or a yield with no remaining spurious wakes.

JCTyblaidd · 2020-12-19T17:07:48Z

Acquire reads or fences that change the local happens-before vector clock are not considered, this is a good point and would probably need to be considered once weak memory support is added since it could change the set of values that could be read from, I need to think about this.

Thought about this some more - the loop is executed once to generate the set of atomic memory locations + sync objects to watch for changes. I think any extra loop executions in the absence of changes would result in the same thread local data-race fence. The only case where it might not would be an acquire fence followed by a relaxed load, which would require 2 loops for the correct final vector clock.

But i think any extra loop executions of the yield loop would only potentially reduce the size of the reads-from-set for atomic loads so missing loop executions would not miss any potential global executions.

RalfJung · 2020-12-19T17:19:44Z

The code dynamically generates what the RCMC paper describes as assume statements.

Which paper is this?

Currently any change is detected even if it changes the value to the same contents, it could be refined to only consider changes

No, "any store" makes sense, please just fix the PR description. :)

are we ever considering load-link, store-conditional intrinsics for the architectures that support them?

I have no idea what this question means.

A thread is considered livelocked or blocked on yield if a thread executes a yield a given number of times and changes no state visible to other threads. A program is considered livelocked if all threads are either blocked on a sync primitive or a yield with no remaining spurious wakes.

Okay, that makes sense, thanks. Please put these definitions in comments in appropriate places. :)

based on this and the few bits of the code that I could make sense of, I now imagine the algorithm to roughly work as follows:

When a thread yields twice and, between the yields, "changes no state visible to other threads", block this thread.
When some other thread does a "change visible to other threads", unblock all yield-blocked threads.
When all threads are yield-blocked, raise an error.

Is this an accurate high-level picture? If not, could you correct it?

How do you detect if a yield "changes no state visible to other threads"?

JCTyblaidd · 2020-12-19T18:09:01Z

The code dynamically generates what the RCMC paper describes as assume statements.

Which paper is this?

Paper - section 2.6
Though the implementation here is different in that instead of waiting on a desired value it waits on a different value, mostly to support more complex spin-loop style constructs and remove the need for more complex program analysis. But it is a similar principle to prevent the exploration of useless states.

Currently any change is detected even if it changes the value to the same contents, it could be refined to only consider changes

No, "any store" makes sense, please just fix the PR description. :)

are we ever considering load-link, store-conditional intrinsics for the architectures that support them?

I have no idea what this question means.

See: https://en.wikipedia.org/wiki/Load-link/store-conditional

A thread is considered livelocked or blocked on yield if a thread executes a yield a given number of times and changes no state visible to other threads. A program is considered livelocked if all threads are either blocked on a sync primitive or a yield with no remaining spurious wakes.

Okay, that makes sense, thanks. Please put these definitions in comments in appropriate places. :)

based on this and the few bits of the code that I could make sense of, I now imagine the algorithm to roughly work as follows:
* When a thread yields twice and, between the yields, "changes no state visible to other threads", block this thread.

* When some other thread _does_ a "change visible to other threads", unblock all yield-blocked threads.

* When all threads are yield-blocked, raise an error.
Is this an accurate high-level picture? If not, could you correct it?
How do you detect if a yield "changes no state visible to other threads"?

When a thread yields first after previously changing state visible to another thread, temporarily yield and then once awake start recording all sync objects and atomic memory read.
For future loop iterations if a different memory value or sync object is read than the first iteration, the thread is considered to have made progress and the counter is reset.
On changing a sync object or atomic memory location wake all other threads that have the memory location in their set of values to watch for changes and reset the current threads state to enabled and forget this threads set of locations and sync objects to watch. Note: this could potentially be improved to not mark the thread as making progress in some cases with no other enabled threads and no other temporarily yield blocked threads but the current implementation considers any atomic store/rmw/sync object mutation as visible progress.
A thread may spuriously wake from a yield without making progress, N a configurable value >= 1 number of times (or infinite if the configuration is set to 0). This is currently implemented by waking all temporarily yield blocked threads when there are no enabled threads free.
If all threads are yielded with no spurious wakes remaining in the counter or otherwise blocked on a sync object and at least once is blocked on a yield then report as livelocked.

The spurious yield wake is mainly for functions of the form:

for _i in 0..X {
   if(atomic_cas_or_similar()) {
       break 
   }else{  yield() }
}

RalfJung · 2021-01-11T20:11:25Z

Sorry for the long delay. This MR is of the kind that's rather hard to understand and review, so when I only have little bits and pieces of time here and there for Rust stuff, that's just not enough... I need larger blocks of continuous time for this which is harder to find, and then it is competing with me finally also doing some proper coding myself again which I haven't done in a while^^. Fragmentation is real. ;)

Though the implementation here is different in that instead of waiting on a desired value it waits on a different value, mostly to support more complex spin-loop style constructs and remove the need for more complex program analysis. But it is a similar principle to prevent the exploration of useless states.

Could you add a link to the paper in a comment in some appropriate place in the code?

and then once awake start recording all sync objects and atomic memory read.

So I guess those are what you later call the "objects to watch"?

For future loop iterations if a different memory value or sync object is read than the first iteration, the thread is considered to have made progress and the counter is reset.

So does it somehow recognize if it is the same yield point again? That would seem strange, but OTOH I am not sure what a "loop iteration" is from the perspective of the interpreter -- a loop and an unrolled loop should look and behave the same, no?

It would be good to put this high-level structure into a comment somewhere. Any ideas what might be a good place?

The spurious yield wake is mainly for functions of the form:

I don't understand why those functions need a "spurious yield" mechanism.

RalfJung · 2021-01-16T15:18:18Z

src/bin/miri.rs

@@ -282,6 +282,16 @@ fn main() {
                    };
                    miri_config.tracked_alloc_id = Some(miri::AllocId(id));
                }
+                arg if arg.starts_with("-Zmiri-max-yield-iterations=") => {


New options should come with a documentation in the README. That's also a good opportunity to try and give a short but precise description of what this does. ;)

RalfJung · 2021-01-16T15:19:04Z

src/data_race.rs

+            "Atomic RMW",
+            // For yields the atomic write overrides all effects of the atomic read
+            // so it is treated as an atomic write.
+            true,


Using bool here makes it hard to figure out what true and false even mean... would it make sense to introduce a 2-variant enum for this purpose?

RalfJung · 2021-01-16T15:20:03Z

src/data_race.rs

+        let place_ptr = place.ptr.assert_ptr();
+        let size = place.layout.size;
+        if write {
+            this.thread_yield_atomic_wake(place_ptr.alloc_id, place_ptr.offset,size);


Instead of passing alloc_id and offset separately, what about passing the entire Pointer?

RalfJung · 2021-01-16T15:21:15Z

src/shims/foreign_items.rs

@@ -222,6 +222,12 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriEvalContextExt<'mir, 'tcx
            "miri_resolve_frame" => {
                this.handle_miri_resolve_frame(args, dest)?;
            }
+            // Performs a thread yield.
+            // Exists since a thread yield operation may not be available on a given platform.
+            "miri_yield_thread" => {


As noted before, std::thread::yield_now is available on all platforms, so I do not understand why this new operation is needed.

RalfJung · 2021-01-16T15:22:23Z

src/thread.rs

+    /// available.
+    DelayOnYield,
+    /// The thread has fully yielded, signalling that it requires another thread
+    /// perform an action visible to it in order to make progress.


Suggested change

/// perform an action visible to it in order to make progress.

/// to perform an action visible to it in order to make progress.

RalfJung · 2021-01-16T15:32:24Z

src/thread.rs

@@ -212,6 +414,11 @@ pub struct ThreadManager<'mir, 'tcx> {
    ///
    /// Note that this vector also contains terminated threads.
    threads: IndexVec<ThreadId, Thread<'mir, 'tcx>>,
+    /// Set of threads that are currently yielding.


"Yield" seems like an atomic operation to me, it's what happens on a single call of yield_now... do you mean threads that are currently recording a watch set?

Is there some kind of invariant that all threads in this set are in state DelayOnYield or so?

RalfJung · 2021-01-16T15:33:27Z

src/thread.rs

+    /// Set of threads that are currently yielding.
+    yielding_thread_set: FxHashSet<ThreadId>,
+    /// The maximum number of yields making no progress required
+    /// on all threads to report a live-lock.


Wait, I thought there's some per-thread thing for how many times we go around a yield loop before blocking... but this seems to be something else, since the comment says it is global ("all threads")? What is this about?

RalfJung · 2021-01-16T15:36:36Z

src/thread.rs

+        self.yielding_thread_set.drain_filter(move |&thread_id| {
+            let thread = &mut threads[thread_id];
+            if thread.yield_state.should_wake_atomic(alloc_id, offset, len) {
+                thread.state = ThreadState::Enabled;


A thread that's currently recording might or might not be yield-blocked, right? So this could be enabling a thread that isn't yield-blocked? Couldn't this be a problem if the thread is currently blocked on something else, like a mutex? I am not sure if all our scheduling primitives are robust wrt. spurious wakeups...

RalfJung · 2021-01-16T15:43:51Z

src/thread.rs

+    }
+
+    /// Starts the next yield iteration
+    fn start_iteration(&mut self) {


This is only called by the scheduler, in case when there was no "regular" thread to make progress with any more and we woke up a DelayOnYield thread, right? If so, please say that in the doc comment; it took me a while to realize this.

RalfJung · 2021-01-16T15:47:19Z

src/thread.rs

+            self.state = ThreadState::BlockedOnYield;
+        } else {
+            log::trace!("Thread entered standard yield with iteration {:?}", iteration_count);
+            self.state = ThreadState::DelayOnYield;


So the purpose of this state seems to be to make the thread sleep until there is no other thread to make progress with any more, then it gets woken up again, and then we do start_iteration -- right?

What I do not understand is: this seems very heavy-handed for a yield. A yielded thread will never be scheduled again if the rest of the program just keeps making progress somewhere. That's very different from the old yield behavior. Couldn't this be a source of problems? Or am I misunderstanding what happens?

bors · 2021-01-28T10:13:06Z

☔ The latest upstream changes (presumably #1686) made this pull request unmergeable. Please resolve the merge conflicts.

RalfJung · 2021-04-06T16:46:09Z

@JCTyblaidd I haven't heard from you in a while, do you still plan to get back to this PR?

JCTyblaidd · 2021-04-13T11:35:22Z

@RalfJung Yes, I will be busy for a month and a bit, but plan to return to this later. Should i close and re-open when I have time in the future?

RalfJung · 2021-04-13T18:02:07Z

Yes, I will be busy for a month and a bit, but plan to return to this later.

That is great to hear. :-)

Should i close and re-open when I have time in the future?

Yes I think that would be better, thanks.

JCTyblaidd added 11 commits December 13, 2020 20:31

Initial implementation of dynamic yield-based livelock detection.

08b7077

Add tests for yield based liveness.

2d46edb

Fix outdated comment

92699ea

More tests, condvar wait livelock detection does not currently work.

24e2691

Add spin loop termination test.

d89c057

Add help information to the diagnostics for livelock.

f7dbd02

Tidy up v1

afe84f2

Tidy up v2

3b5bb4d

Tidy validate_atomic_op

d4fc2bf

Increase duration timeout for cond-var to a non-zero value.

9997fb1

Change to use yield intrinsic, spin loop hint only currently works on…

6c8484f

… x86.

RalfJung reviewed Dec 19, 2020

View reviewed changes

RalfJung reviewed Jan 16, 2021

View reviewed changes

RalfJung closed this Apr 13, 2021

RalfJung mentioned this pull request Apr 23, 2021

Miri hangs in spin loops forever #1388

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve yield handling and add yield-based livelock detection. #1651

Improve yield handling and add yield-based livelock detection. #1651

JCTyblaidd commented Dec 15, 2020

JCTyblaidd commented Dec 15, 2020

RalfJung commented Dec 19, 2020

RalfJung Dec 19, 2020

JCTyblaidd Dec 19, 2020

RalfJung Dec 19, 2020

JCTyblaidd commented Dec 19, 2020

JCTyblaidd commented Dec 19, 2020

RalfJung commented Dec 19, 2020

JCTyblaidd commented Dec 19, 2020

RalfJung commented Jan 11, 2021

RalfJung Jan 16, 2021

RalfJung Jan 16, 2021

RalfJung Jan 16, 2021

RalfJung Jan 16, 2021

RalfJung Jan 16, 2021

RalfJung Jan 16, 2021

RalfJung Jan 16, 2021

RalfJung Jan 16, 2021

RalfJung Jan 16, 2021

RalfJung Jan 16, 2021

RalfJung Jan 16, 2021

bors commented Jan 28, 2021

RalfJung commented Apr 6, 2021

JCTyblaidd commented Apr 13, 2021

RalfJung commented Apr 13, 2021

	/// perform an action visible to it in order to make progress.
	/// to perform an action visible to it in order to make progress.

Improve yield handling and add yield-based livelock detection. #1651

Improve yield handling and add yield-based livelock detection. #1651

Conversation

JCTyblaidd commented Dec 15, 2020

JCTyblaidd commented Dec 15, 2020

RalfJung commented Dec 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JCTyblaidd commented Dec 19, 2020

JCTyblaidd commented Dec 19, 2020

RalfJung commented Dec 19, 2020

JCTyblaidd commented Dec 19, 2020

RalfJung commented Jan 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors commented Jan 28, 2021

RalfJung commented Apr 6, 2021

JCTyblaidd commented Apr 13, 2021

RalfJung commented Apr 13, 2021