-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve yield handling and add yield-based livelock detection. #1651
Improve yield handling and add yield-based livelock detection. #1651
Conversation
Ok, i think i have narrowed the ci run-forever down to the mips architecture in macos, since spin_loop_hint only calls the yield code in miri on x86, this should ideally be changed on the rustc side to call the new miri exported shim under cfg(miri). @RalfJung - the first two ci instances need to be manually terminated since they don't seem to auto-timeout. |
Wow, there's a lot going on here! Thanks a ton. However, unfortunately I don't really understand what is happening. Is there a high-level description of the problem this is solving, and how it is being solved? I am somewhat worried about adding all this code that I will have no way of maintaining since I have no clue what it does.^^
This seems pretty similar to the status quo? We have to add yields to avoid spinning in a single loop forever? The introduction makes it sound like this is a fix to the scheduler, to avoid scheduling a spin loop forever. Later it sounds more like this is an extension of deadlock detection. Which of the two is true? Are there any programs that looped forever before, that now execute properly?
This is all very vague... how exactly do yields define a watch set? Does the location really have to change or is a non-changing store enough? What about acquire reads that also change the local state? There are many ways to define "livelock" and "progress"; how exactly do you define these notions here? |
|
||
// Note: should change to use this once cfg(miri) to use intrinsic. | ||
//std::sync::atomic::spin_loop_hint(); | ||
unsafe { miri_yield_thread(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using std::thread::yield_now
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would work but I wold ideally like to replace this with spin_loop_hint once it works correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but then we can avoid adding miri_yield_thread
in the first place.^^
The main motivation for this change is for model checking, at the cost of requiring the programmer annotate all forms of spin-loop or similar structures with a yield operation, this ensures that the loop is only executed once and then further loop executions only occur to check for progress if the values the loop checked may have changed. The code dynamically generates what the RCMC paper describes as assume statements.
This now fixes some programs with multiple waiting spin loops, given yield annotations where the old yield implementation would eternally bounce between the two threads and never terminate.
Currently any change is detected even if it changes the value to the same contents, it could be refined to only consider changes, are we ever considering load-link, store-conditional intrinsics for the architectures that support them? Acquire reads or fences that change the local happens-before vector clock are not considered, this is a good point and would probably need to be considered once weak memory support is added since it could change the set of values that could be read from, I need to think about this.
A thread is considered livelocked or blocked on yield if a thread executes a yield a given number of times and changes no state visible to other threads. A program is considered livelocked if all threads are either blocked on a sync primitive or a yield with no remaining spurious wakes. |
Thought about this some more - the loop is executed once to generate the set of atomic memory locations + sync objects to watch for changes. I think any extra loop executions in the absence of changes would result in the same thread local data-race fence. The only case where it might not would be an acquire fence followed by a relaxed load, which would require 2 loops for the correct final vector clock. But i think any extra loop executions of the yield loop would only potentially reduce the size of the reads-from-set for atomic loads so missing loop executions would not miss any potential global executions. |
Which paper is this?
No, "any store" makes sense, please just fix the PR description. :)
I have no idea what this question means.
Okay, that makes sense, thanks. Please put these definitions in comments in appropriate places. :) based on this and the few bits of the code that I could make sense of, I now imagine the algorithm to roughly work as follows:
Is this an accurate high-level picture? If not, could you correct it? How do you detect if a yield "changes no state visible to other threads"? |
Paper - section 2.6
See: https://en.wikipedia.org/wiki/Load-link/store-conditional
The spurious yield wake is mainly for functions of the form:
|
Sorry for the long delay. This MR is of the kind that's rather hard to understand and review, so when I only have little bits and pieces of time here and there for Rust stuff, that's just not enough... I need larger blocks of continuous time for this which is harder to find, and then it is competing with me finally also doing some proper coding myself again which I haven't done in a while^^. Fragmentation is real. ;)
Could you add a link to the paper in a comment in some appropriate place in the code?
So I guess those are what you later call the "objects to watch"?
So does it somehow recognize if it is the same yield point again? That would seem strange, but OTOH I am not sure what a "loop iteration" is from the perspective of the interpreter -- a loop and an unrolled loop should look and behave the same, no? It would be good to put this high-level structure into a comment somewhere. Any ideas what might be a good place?
I don't understand why those functions need a "spurious yield" mechanism. |
@@ -282,6 +282,16 @@ fn main() { | |||
}; | |||
miri_config.tracked_alloc_id = Some(miri::AllocId(id)); | |||
} | |||
arg if arg.starts_with("-Zmiri-max-yield-iterations=") => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New options should come with a documentation in the README. That's also a good opportunity to try and give a short but precise description of what this does. ;)
"Atomic RMW", | ||
// For yields the atomic write overrides all effects of the atomic read | ||
// so it is treated as an atomic write. | ||
true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bool
here makes it hard to figure out what true
and false
even mean... would it make sense to introduce a 2-variant enum
for this purpose?
let place_ptr = place.ptr.assert_ptr(); | ||
let size = place.layout.size; | ||
if write { | ||
this.thread_yield_atomic_wake(place_ptr.alloc_id, place_ptr.offset,size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing alloc_id
and offset
separately, what about passing the entire Pointer
?
@@ -222,6 +222,12 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriEvalContextExt<'mir, 'tcx | |||
"miri_resolve_frame" => { | |||
this.handle_miri_resolve_frame(args, dest)?; | |||
} | |||
// Performs a thread yield. | |||
// Exists since a thread yield operation may not be available on a given platform. | |||
"miri_yield_thread" => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted before, std::thread::yield_now
is available on all platforms, so I do not understand why this new operation is needed.
/// available. | ||
DelayOnYield, | ||
/// The thread has fully yielded, signalling that it requires another thread | ||
/// perform an action visible to it in order to make progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// perform an action visible to it in order to make progress. | |
/// to perform an action visible to it in order to make progress. |
@@ -212,6 +414,11 @@ pub struct ThreadManager<'mir, 'tcx> { | |||
/// | |||
/// Note that this vector also contains terminated threads. | |||
threads: IndexVec<ThreadId, Thread<'mir, 'tcx>>, | |||
/// Set of threads that are currently yielding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Yield" seems like an atomic operation to me, it's what happens on a single call of yield_now
... do you mean threads that are currently recording a watch set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some kind of invariant that all threads in this set are in state DelayOnYield
or so?
/// Set of threads that are currently yielding. | ||
yielding_thread_set: FxHashSet<ThreadId>, | ||
/// The maximum number of yields making no progress required | ||
/// on all threads to report a live-lock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I thought there's some per-thread thing for how many times we go around a yield
loop before blocking... but this seems to be something else, since the comment says it is global ("all threads")? What is this about?
self.yielding_thread_set.drain_filter(move |&thread_id| { | ||
let thread = &mut threads[thread_id]; | ||
if thread.yield_state.should_wake_atomic(alloc_id, offset, len) { | ||
thread.state = ThreadState::Enabled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thread that's currently recording might or might not be yield-blocked, right? So this could be enabling a thread that isn't yield-blocked? Couldn't this be a problem if the thread is currently blocked on something else, like a mutex? I am not sure if all our scheduling primitives are robust wrt. spurious wakeups...
} | ||
|
||
/// Starts the next yield iteration | ||
fn start_iteration(&mut self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only called by the scheduler, in case when there was no "regular" thread to make progress with any more and we woke up a DelayOnYield
thread, right? If so, please say that in the doc comment; it took me a while to realize this.
self.state = ThreadState::BlockedOnYield; | ||
} else { | ||
log::trace!("Thread entered standard yield with iteration {:?}", iteration_count); | ||
self.state = ThreadState::DelayOnYield; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the purpose of this state seems to be to make the thread sleep until there is no other thread to make progress with any more, then it gets woken up again, and then we do start_iteration
-- right?
What I do not understand is: this seems very heavy-handed for a yield. A yielded thread will never be scheduled again if the rest of the program just keeps making progress somewhere. That's very different from the old yield behavior. Couldn't this be a source of problems? Or am I misunderstanding what happens?
☔ The latest upstream changes (presumably #1686) made this pull request unmergeable. Please resolve the merge conflicts. |
@JCTyblaidd I haven't heard from you in a while, do you still plan to get back to this PR? |
@RalfJung Yes, I will be busy for a month and a bit, but plan to return to this later. Should i close and re-open when I have time in the future? |
That is great to hear. :-)
Yes I think that would be better, thanks. |
This improves on #1388, however it requires that programs are annotated with yields, either spin_loop_hint or thread_yield in order to work. Additional configuration with a liveness guarantee might be worth adding later to fully solve the issue if yield statements are not present.
This functions by using yields to dynamically generate watch sets of concurrent objects and atomic memory locations that must change in order for the yielding thread to make progress; and blocking on the state updating. A configurable but limited number of spurious wake events from yields are allowed for cases where the yield loops have a finite number of iterations and a fallback that must be explored or are present as optimizations and not in a blocking loop. A thread performing any concurrently visible state change resets the counter.
The current default value of the yield iterations is 1, (0 is infinite), I am unsure if we should leave this as the default or change it.