-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unclear "Concurrent load and mut accesses" with an atomic counter #260
Comments
Hmm, this certainly looks off...that error should only be emitted if the atomic |
@hawkw could you elaborate on how this should ideally work? It's not clear to me what Loom is supposed to do here. On one hand, this is an atomic doing atomic ops. It's totally thread safe. I hope Loom follows the second line of thought. In this case, the current modelling for atomics is too simplistic: the code simply considers Perhaps, Loom needs to be taught to see that |
Sorry if my first comment wasn't clear! What I meant is, In particular, a causality violation can only occur when a value is accessed non-atomically. In the case of an
Loom does also help you catch bugs other than data races. It does this by simulating all the possible flows through the program where the atomic may have different values. If a race condition on an atomic (i.e., not a data race) would cause incorrect behavior, this would be caught by assertions in the test or in the code under test, by Loom's deadlock and memory leak checking, or by causing a causality violation if the atomic controls access to an Therefore, since you only call Does that help to clarify what I meant? |
Also, looking at the code, the only place where an atomic will print the specific error message "concurrent load and mut accesses"1 is here: Lines 568 to 582 in 6d15996
which is only called inside of the with_mut method. In addition, the error message should include the location of the with_mut call that caused the mutable access. So, it really looks like this particular error was emitted incorrectly.
Footnotes
|
@hawkw please accept my greatest appreciation to the detail in which you described the issue. I'll try to debug it a bit more and see where Loom may behave erroneously. |
So So, from the code standpoint, it doesn't look like an oversight, but rather an architectural bug/feature. |
Hmm, it's strange that your panic is missing a |
As a side note, I recently hit the same error in some code of my own. I think what's happening here is that loom is (perhaps incorrectly?) treating the creation of the atomic as a mutable access? |
Oh, okay, I've figured something out. It looks like one of the two locations is not being printed because no location was captured for a write on that thread...which means that thread never actually started a write. But, somehow, the model believes that thread has accessed the atomic mutably. I think there's definitely some kind of bug. |
I'm running into this bug as well, I'm not using |
Has anyone found a workaround for this problem? |
I tried to put together a minimal example for my case: #[test]
fn loom_260() {
use loom::sync::{atomic::{AtomicPtr, AtomicU32, Ordering::*}, Mutex};
// A naive concurrent slab
struct Slots {
head: AtomicU32,
slots: AtomicPtr<Vec</* next */ AtomicU32>>, // lazily allocated
lock: Mutex<()>,
}
impl Slots {
fn push(&self) -> u32 {
let mut slots = self.slots.load(Relaxed); // `Acquire` fixes the issue <<<
if slots.is_null() {
self.try_alloc();
slots = self.slots.load(Acquire);
}
let slots = unsafe { slots.as_ref().unwrap() };
let mut head = self.head.load(Acquire);
loop {
let next_head = slots[head as usize].load(Acquire);
match self.head.compare_exchange(head, next_head, AcqRel, Acquire) {
Ok(_) => return head,
Err(new) => head = new,
}
}
}
fn try_alloc(&self) {
let _guard = self.lock.lock().unwrap();
if !self.slots.load(Acquire).is_null() {
return;
}
let slots = (0..4).map(|i| AtomicU32::new(i + 1)).collect();
let ptr = Box::into_raw(Box::new(slots));
self.slots.store(ptr, Release);
}
}
loom::model(|| {
let slots = Arc::new(Slots {
head: AtomicU32::new(0),
slots: AtomicPtr::new(std::ptr::null_mut()),
lock: Mutex::new(()),
});
let slots1 = slots.clone();
let t1 = thread::spawn(move || slots1.push());
slots.push();
t1.join().unwrap();
});
} Although the message is unclear, the loom actually "detects" something in this case. I wonder why this code is considered incorrect. Is this an imperfection of the loom, or it's a real problem according to the atomic model? |
I think I'll try my hand at this, since there hasn't been any response in months. I can reproduce the bug on Let me see if I can figure out where loom is going wrong 😄 NOTE: adding |
Ignore this see my next comment, this one was based on the assumption that the example above was correct, which it isn'tI think tracking Line 711 in 6a03bf0
Removing that one line, and all loom's tests still pass... but that doesn't seem right since this is basically just disabling the check all together |
@loyd let mut slots = self.slots.load(Relaxed); // `Acquire` fixes the issue <<< This load does need to be The load under the mutex can be Relaxed, but the one outside the mutex must be Acquire. Not sure why I didn't see that sooner 🙃 |
@RustyYato, thanks for the quick response and I apologize for the time spent. I'm not sure why this is the case or what can be done to disrupt the correctness of the code, but apparently, this is a real bug, and thanks to Loom for catching that. Anyway, it would be nice to have a better message instead of only |
Disclaimer: I think this project is amazingly cool! I'd love to ask this question in a discussion or in a matrix room, but there is no link to either in the project, so I'm filing it here.
Loom complains about this line - https://github.com/kvark/choir/blob/0c7af40b88e3613520eba52e4993000ee5ca2fae/tests/basic.rs#L34
First problem is that it lists the place an atomic was "created" and "load", but not where it got "mut" access, referenced in the error.
I assume it doesn't like this
fetch_add
, but I don't know why. It's just an atomic counter.The text was updated successfully, but these errors were encountered: