Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve yield handling and add yield-based livelock detection. #1651

Closed

Conversation

JCTyblaidd
Copy link
Contributor

This improves on #1388, however it requires that programs are annotated with yields, either spin_loop_hint or thread_yield in order to work. Additional configuration with a liveness guarantee might be worth adding later to fully solve the issue if yield statements are not present.

This functions by using yields to dynamically generate watch sets of concurrent objects and atomic memory locations that must change in order for the yielding thread to make progress; and blocking on the state updating. A configurable but limited number of spurious wake events from yields are allowed for cases where the yield loops have a finite number of iterations and a fallback that must be explored or are present as optimizations and not in a blocking loop. A thread performing any concurrently visible state change resets the counter.

The current default value of the yield iterations is 1, (0 is infinite), I am unsure if we should leave this as the default or change it.

@JCTyblaidd
Copy link
Contributor Author

Ok, i think i have narrowed the ci run-forever down to the mips architecture in macos, since spin_loop_hint only calls the yield code in miri on x86, this should ideally be changed on the rustc side to call the new miri exported shim under cfg(miri).

@RalfJung - the first two ci instances need to be manually terminated since they don't seem to auto-timeout.

@RalfJung
Copy link
Member

Wow, there's a lot going on here! Thanks a ton.

However, unfortunately I don't really understand what is happening. Is there a high-level description of the problem this is solving, and how it is being solved? I am somewhat worried about adding all this code that I will have no way of maintaining since I have no clue what it does.^^

it requires that programs are annotated with yields, either spin_loop_hint or thread_yield in order to work

This seems pretty similar to the status quo? We have to add yields to avoid spinning in a single loop forever?

The introduction makes it sound like this is a fix to the scheduler, to avoid scheduling a spin loop forever. Later it sounds more like this is an extension of deadlock detection. Which of the two is true? Are there any programs that looped forever before, that now execute properly?

This functions by using yields to dynamically generate watch sets of concurrent objects and atomic memory locations that must change in order for the yielding thread to make progress; and blocking on the state updating.

This is all very vague... how exactly do yields define a watch set? Does the location really have to change or is a non-changing store enough? What about acquire reads that also change the local state?

There are many ways to define "livelock" and "progress"; how exactly do you define these notions here?


// Note: should change to use this once cfg(miri) to use intrinsic.
//std::sync::atomic::spin_loop_hint();
unsafe { miri_yield_thread(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using std::thread::yield_now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would work but I wold ideally like to replace this with spin_loop_hint once it works correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but then we can avoid adding miri_yield_thread in the first place.^^

@JCTyblaidd
Copy link
Contributor Author

Wow, there's a lot going on here! Thanks a ton.

However, unfortunately I don't really understand what is happening. Is there a high-level description of the problem this is solving, and how it is being solved? I am somewhat worried about adding all this code that I will have no way of maintaining since I have no clue what it does.^^

The main motivation for this change is for model checking, at the cost of requiring the programmer annotate all forms of spin-loop or similar structures with a yield operation, this ensures that the loop is only executed once and then further loop executions only occur to check for progress if the values the loop checked may have changed.

The code dynamically generates what the RCMC paper describes as assume statements.

it requires that programs are annotated with yields, either spin_loop_hint or thread_yield in order to work

This seems pretty similar to the status quo? We have to add yields to avoid spinning in a single loop forever?

The introduction makes it sound like this is a fix to the scheduler, to avoid scheduling a spin loop forever. Later it sounds more like this is an extension of deadlock detection. Which of the two is true? Are there any programs that looped forever before, that now execute properly?

This now fixes some programs with multiple waiting spin loops, given yield annotations where the old yield implementation would eternally bounce between the two threads and never terminate.

This functions by using yields to dynamically generate watch sets of concurrent objects and atomic memory locations that must change in order for the yielding thread to make progress; and blocking on the state updating.

This is all very vague... how exactly do yields define a watch set? Does the location really have to change or is a non-changing store enough? What about acquire reads that also change the local state?

Currently any change is detected even if it changes the value to the same contents, it could be refined to only consider changes, are we ever considering load-link, store-conditional intrinsics for the architectures that support them?

Acquire reads or fences that change the local happens-before vector clock are not considered, this is a good point and would probably need to be considered once weak memory support is added since it could change the set of values that could be read from, I need to think about this.

There are many ways to define "livelock" and "progress"; how exactly do you define these notions here?

A thread is considered livelocked or blocked on yield if a thread executes a yield a given number of times and changes no state visible to other threads. A program is considered livelocked if all threads are either blocked on a sync primitive or a yield with no remaining spurious wakes.

@JCTyblaidd
Copy link
Contributor Author

Acquire reads or fences that change the local happens-before vector clock are not considered, this is a good point and would probably need to be considered once weak memory support is added since it could change the set of values that could be read from, I need to think about this.

Thought about this some more - the loop is executed once to generate the set of atomic memory locations + sync objects to watch for changes. I think any extra loop executions in the absence of changes would result in the same thread local data-race fence. The only case where it might not would be an acquire fence followed by a relaxed load, which would require 2 loops for the correct final vector clock.

But i think any extra loop executions of the yield loop would only potentially reduce the size of the reads-from-set for atomic loads so missing loop executions would not miss any potential global executions.

@RalfJung
Copy link
Member

The code dynamically generates what the RCMC paper describes as assume statements.

Which paper is this?

Currently any change is detected even if it changes the value to the same contents, it could be refined to only consider changes

No, "any store" makes sense, please just fix the PR description. :)

are we ever considering load-link, store-conditional intrinsics for the architectures that support them?

I have no idea what this question means.

A thread is considered livelocked or blocked on yield if a thread executes a yield a given number of times and changes no state visible to other threads. A program is considered livelocked if all threads are either blocked on a sync primitive or a yield with no remaining spurious wakes.

Okay, that makes sense, thanks. Please put these definitions in comments in appropriate places. :)

based on this and the few bits of the code that I could make sense of, I now imagine the algorithm to roughly work as follows:

  • When a thread yields twice and, between the yields, "changes no state visible to other threads", block this thread.
  • When some other thread does a "change visible to other threads", unblock all yield-blocked threads.
  • When all threads are yield-blocked, raise an error.

Is this an accurate high-level picture? If not, could you correct it?

How do you detect if a yield "changes no state visible to other threads"?

@JCTyblaidd
Copy link
Contributor Author

The code dynamically generates what the RCMC paper describes as assume statements.

Which paper is this?

Paper - section 2.6
Though the implementation here is different in that instead of waiting on a desired value it waits on a different value, mostly to support more complex spin-loop style constructs and remove the need for more complex program analysis. But it is a similar principle to prevent the exploration of useless states.

Currently any change is detected even if it changes the value to the same contents, it could be refined to only consider changes

No, "any store" makes sense, please just fix the PR description. :)

are we ever considering load-link, store-conditional intrinsics for the architectures that support them?

I have no idea what this question means.

See: https://en.wikipedia.org/wiki/Load-link/store-conditional

A thread is considered livelocked or blocked on yield if a thread executes a yield a given number of times and changes no state visible to other threads. A program is considered livelocked if all threads are either blocked on a sync primitive or a yield with no remaining spurious wakes.

Okay, that makes sense, thanks. Please put these definitions in comments in appropriate places. :)

based on this and the few bits of the code that I could make sense of, I now imagine the algorithm to roughly work as follows:

* When a thread yields twice and, between the yields, "changes no state visible to other threads", block this thread.

* When some other thread _does_ a "change visible to other threads", unblock all yield-blocked threads.

* When all threads are yield-blocked, raise an error.

Is this an accurate high-level picture? If not, could you correct it?
How do you detect if a yield "changes no state visible to other threads"?

  • When a thread yields first after previously changing state visible to another thread, temporarily yield and then once awake start recording all sync objects and atomic memory read.

  • For future loop iterations if a different memory value or sync object is read than the first iteration, the thread is considered to have made progress and the counter is reset.

  • On changing a sync object or atomic memory location wake all other threads that have the memory location in their set of values to watch for changes and reset the current threads state to enabled and forget this threads set of locations and sync objects to watch. Note: this could potentially be improved to not mark the thread as making progress in some cases with no other enabled threads and no other temporarily yield blocked threads but the current implementation considers any atomic store/rmw/sync object mutation as visible progress.

  • A thread may spuriously wake from a yield without making progress, N a configurable value >= 1 number of times (or infinite if the configuration is set to 0). This is currently implemented by waking all temporarily yield blocked threads when there are no enabled threads free.

  • If all threads are yielded with no spurious wakes remaining in the counter or otherwise blocked on a sync object and at least once is blocked on a yield then report as livelocked.

The spurious yield wake is mainly for functions of the form:

for _i in 0..X {
   if(atomic_cas_or_similar()) {
       break 
   }else{  yield() }
}

@RalfJung
Copy link
Member

Sorry for the long delay. This MR is of the kind that's rather hard to understand and review, so when I only have little bits and pieces of time here and there for Rust stuff, that's just not enough... I need larger blocks of continuous time for this which is harder to find, and then it is competing with me finally also doing some proper coding myself again which I haven't done in a while^^. Fragmentation is real. ;)

Though the implementation here is different in that instead of waiting on a desired value it waits on a different value, mostly to support more complex spin-loop style constructs and remove the need for more complex program analysis. But it is a similar principle to prevent the exploration of useless states.

Could you add a link to the paper in a comment in some appropriate place in the code?

and then once awake start recording all sync objects and atomic memory read.

So I guess those are what you later call the "objects to watch"?

For future loop iterations if a different memory value or sync object is read than the first iteration, the thread is considered to have made progress and the counter is reset.

So does it somehow recognize if it is the same yield point again? That would seem strange, but OTOH I am not sure what a "loop iteration" is from the perspective of the interpreter -- a loop and an unrolled loop should look and behave the same, no?

It would be good to put this high-level structure into a comment somewhere. Any ideas what might be a good place?

The spurious yield wake is mainly for functions of the form:

I don't understand why those functions need a "spurious yield" mechanism.

@@ -282,6 +282,16 @@ fn main() {
};
miri_config.tracked_alloc_id = Some(miri::AllocId(id));
}
arg if arg.starts_with("-Zmiri-max-yield-iterations=") => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New options should come with a documentation in the README. That's also a good opportunity to try and give a short but precise description of what this does. ;)

"Atomic RMW",
// For yields the atomic write overrides all effects of the atomic read
// so it is treated as an atomic write.
true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bool here makes it hard to figure out what true and false even mean... would it make sense to introduce a 2-variant enum for this purpose?

let place_ptr = place.ptr.assert_ptr();
let size = place.layout.size;
if write {
this.thread_yield_atomic_wake(place_ptr.alloc_id, place_ptr.offset,size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing alloc_id and offset separately, what about passing the entire Pointer?

@@ -222,6 +222,12 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriEvalContextExt<'mir, 'tcx
"miri_resolve_frame" => {
this.handle_miri_resolve_frame(args, dest)?;
}
// Performs a thread yield.
// Exists since a thread yield operation may not be available on a given platform.
"miri_yield_thread" => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted before, std::thread::yield_now is available on all platforms, so I do not understand why this new operation is needed.

/// available.
DelayOnYield,
/// The thread has fully yielded, signalling that it requires another thread
/// perform an action visible to it in order to make progress.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// perform an action visible to it in order to make progress.
/// to perform an action visible to it in order to make progress.

@@ -212,6 +414,11 @@ pub struct ThreadManager<'mir, 'tcx> {
///
/// Note that this vector also contains terminated threads.
threads: IndexVec<ThreadId, Thread<'mir, 'tcx>>,
/// Set of threads that are currently yielding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Yield" seems like an atomic operation to me, it's what happens on a single call of yield_now... do you mean threads that are currently recording a watch set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some kind of invariant that all threads in this set are in state DelayOnYield or so?

/// Set of threads that are currently yielding.
yielding_thread_set: FxHashSet<ThreadId>,
/// The maximum number of yields making no progress required
/// on all threads to report a live-lock.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I thought there's some per-thread thing for how many times we go around a yield loop before blocking... but this seems to be something else, since the comment says it is global ("all threads")? What is this about?

self.yielding_thread_set.drain_filter(move |&thread_id| {
let thread = &mut threads[thread_id];
if thread.yield_state.should_wake_atomic(alloc_id, offset, len) {
thread.state = ThreadState::Enabled;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thread that's currently recording might or might not be yield-blocked, right? So this could be enabling a thread that isn't yield-blocked? Couldn't this be a problem if the thread is currently blocked on something else, like a mutex? I am not sure if all our scheduling primitives are robust wrt. spurious wakeups...

}

/// Starts the next yield iteration
fn start_iteration(&mut self) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called by the scheduler, in case when there was no "regular" thread to make progress with any more and we woke up a DelayOnYield thread, right? If so, please say that in the doc comment; it took me a while to realize this.

self.state = ThreadState::BlockedOnYield;
} else {
log::trace!("Thread entered standard yield with iteration {:?}", iteration_count);
self.state = ThreadState::DelayOnYield;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the purpose of this state seems to be to make the thread sleep until there is no other thread to make progress with any more, then it gets woken up again, and then we do start_iteration -- right?

What I do not understand is: this seems very heavy-handed for a yield. A yielded thread will never be scheduled again if the rest of the program just keeps making progress somewhere. That's very different from the old yield behavior. Couldn't this be a source of problems? Or am I misunderstanding what happens?

@bors
Copy link
Contributor

bors commented Jan 28, 2021

☔ The latest upstream changes (presumably #1686) made this pull request unmergeable. Please resolve the merge conflicts.

@RalfJung
Copy link
Member

RalfJung commented Apr 6, 2021

@JCTyblaidd I haven't heard from you in a while, do you still plan to get back to this PR?

@JCTyblaidd
Copy link
Contributor Author

@RalfJung Yes, I will be busy for a month and a bit, but plan to return to this later. Should i close and re-open when I have time in the future?

@RalfJung
Copy link
Member

Yes, I will be busy for a month and a bit, but plan to return to this later.

That is great to hear. :-)

Should i close and re-open when I have time in the future?

Yes I think that would be better, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants