Unclear "Concurrent load and mut accesses" with an atomic counter #260

kvark · 2022-04-24T07:39:49Z

Disclaimer: I think this project is amazingly cool! I'd love to ask this question in a discussion or in a matrix room, but there is no link to either in the project, so I'm filing it here.

Loom complains about this line - https://github.com/kvark/choir/blob/0c7af40b88e3613520eba52e4993000ee5ca2fae/tests/basic.rs#L34

Causality violation: Concurrent load and mut accesses.
    created: tests/basic.rs:27:26
       load: thread #1 @ tests/basic.rs:34:15
', /Users/kvark/.cargo/git/checkouts/loom-1f4be9fd68e6b59b/d0c1e71/src/rt/location.rs:115:9

First problem is that it lists the place an atomic was "created" and "load", but not where it got "mut" access, referenced in the error.

I assume it doesn't like this fetch_add, but I don't know why. It's just an atomic counter.

The text was updated successfully, but these errors were encountered:

hawkw · 2022-04-25T22:31:15Z

Hmm, this certainly looks off...that error should only be emitted if the atomic get_mut method is used, which it appears your test never does. This could be a bug!

kvark · 2022-04-26T05:59:28Z

@hawkw could you elaborate on how this should ideally work?

It's not clear to me what Loom is supposed to do here. On one hand, this is an atomic doing atomic ops. It's totally thread safe.
On the other hand, if an atomic can end up with a different value based on the flow of the program, it's quite likely a bug.

I hope Loom follows the second line of thought. In this case, the current modelling for atomics is too simplistic: the code simply considers fetch_add to be a read-modify-write operation. This means Loom would consider fetch_add colliding with fetch_and equally dangerous as colliding with another fetch_add.

Perhaps, Loom needs to be taught to see that fetch_add can happen with any order versus another fetch_add or fetch_sub (as long as there are no overflows/underflows)?

hawkw · 2022-04-26T17:32:01Z

Sorry if my first comment wasn't clear! What I meant is, loom only emits "Causality violation" errors when an access would violate the C++11 memory model — that error doesn't mean that there was any kind of bug in the program, it specifically means that a data race would occur.

In particular, a causality violation can only occur when a value is accessed non-atomically. In the case of an AtomicUsize, this specifically means either the with_mut or unsync_load methods. Since a causality violation indicates a data race was detected, it should never be possible to cause a causality violation in purely safe Rust — only unsafe code can potentially violate the memory model.

It's not clear to me what Loom is supposed to do here. On one hand, this is an atomic doing atomic ops. It's totally thread safe.
On the other hand, if an atomic can end up with a different value based on the flow of the program, it's quite likely a bug.

Loom does also help you catch bugs other than data races. It does this by simulating all the possible flows through the program where the atomic may have different values. If a race condition on an atomic (i.e., not a data race) would cause incorrect behavior, this would be caught by assertions in the test or in the code under test, by Loom's deadlock and memory leak checking, or by causing a causality violation if the atomic controls access to an UnsafeCell or similar. Interacting with an atomic using only atomic operations can still have bugs that Loom can find, but those bugs are not causality violations.

Therefore, since you only call fetch_add and load on the atomic, this code should never result in a causality violation error, so the error is definitely incorrect.

Does that help to clarify what I meant?

hawkw · 2022-04-26T17:37:17Z

Also, looking at the code, the only place where an atomic will print the specific error message "concurrent load and mut accesses"¹ is here:

loom/src/rt/atomic.rs

Lines 568 to 582 in 6d15996

    
           /// Track an atomic load 
        
           fn track_load(&mut self, threads: &thread::Set) { 
        
               assert!(!self.is_mutating, "atomic cell is in `with_mut` call"); 
        
               let current = &threads.active().causality; 
        
               if let Some(mut_at) = current.ahead(&self.unsync_mut_at) { 
        
                   location::panic("Causality violation: Concurrent load and mut accesses.") 
        
                       .location("created", self.created_location) 
        
                       .thread("with_mut", mut_at, self.unsync_mut_locations[mut_at]) 
        
                       .thread("load", threads.active_id(), self.loaded_locations[threads]) 
        
                       .fire(); 
        
               } 
        
               self.loaded_at.join(current);

which is only called inside of the with_mut method. In addition, the error message should include the location of the with_mut call that caused the mutable access. So, it really looks like this particular error was emitted incorrectly.

~~@kvark what version of loom are you using? is it possible your dependency is out of date?~~
Edit: ah, I see that you're using the latest HEAD commit: kvark/choir@0c7af40#diff-2e9d962a08321605940b5a657135052fbcef87b5e360662bb527c96d9a615542R24 ... so that's even weirder; I actually don't see anywhere where that specific panic should be emitted without a with_mut call in the current HEAD.

other causality violation messages are also printed by atomics but they have slightly different text. ↩

kvark · 2022-04-27T04:54:33Z

@hawkw please accept my greatest appreciation to the detail in which you described the issue. I'll try to debug it a bit more and see where Loom may behave erroneously.

kvark · 2022-04-27T05:01:38Z

So track_load is indeed one that fires it. However, it's not only called from get_mut. It's called from rmw(). This one is called from Atomic::try_rmw. This is called essentially by all atomic instructions.

So, from the code standpoint, it doesn't look like an oversight, but rather an architectural bug/feature.

hawkw · 2022-05-03T23:46:06Z

Hmm, it's strange that your panic is missing a with_mut location, which the code in track_load() should add to the error message. That's what's confusing me about this...

hawkw · 2022-05-03T23:48:28Z

As a side note, I recently hit the same error in some code of my own. I think what's happening here is that loom is (perhaps incorrectly?) treating the creation of the atomic as a mutable access?

hawkw · 2022-05-04T21:27:06Z

Oh, okay, I've figured something out. It looks like one of the two locations is not being printed because no location was captured for a write on that thread...which means that thread never actually started a write. But, somehow, the model believes that thread has accessed the atomic mutably. I think there's definitely some kind of bug.

RustyYato · 2023-08-10T16:27:57Z

I'm running into this bug as well, I'm not using with_mut or load_unsync anywhere in my code-base, so it's clearly not that. When I turn on locations, I see that it's reporting Atomic*::new and Atomic*::load as conflicting, which makes no sense to me.
Is there some way I can help to fix this bug? Or a workaround that I can use to get around it? (like skipping that iteration).

loyd · 2024-04-03T19:46:21Z

Has anyone found a workaround for this problem?

loyd · 2024-04-04T09:24:51Z

I tried to put together a minimal example for my case:

#[test]
fn loom_260() {
    use loom::sync::{atomic::{AtomicPtr, AtomicU32, Ordering::*}, Mutex};

    // A naive concurrent slab
    struct Slots {
        head: AtomicU32,
        slots: AtomicPtr<Vec</* next */ AtomicU32>>, // lazily allocated
        lock: Mutex<()>,
    }

    impl Slots {
        fn push(&self) -> u32 {
            let mut slots = self.slots.load(Relaxed); // `Acquire` fixes the issue <<<
            if slots.is_null() {
                self.try_alloc();
                slots = self.slots.load(Acquire);
            }

            let slots = unsafe { slots.as_ref().unwrap() };

            let mut head = self.head.load(Acquire);
            loop {
                let next_head = slots[head as usize].load(Acquire);
                match self.head.compare_exchange(head, next_head, AcqRel, Acquire) {
                    Ok(_) => return head,
                    Err(new) => head = new,
                }
            }
        }

        fn try_alloc(&self) {
            let _guard = self.lock.lock().unwrap();
            if !self.slots.load(Acquire).is_null() {
                return;
            }

            let slots = (0..4).map(|i| AtomicU32::new(i + 1)).collect();
            let ptr = Box::into_raw(Box::new(slots));
            self.slots.store(ptr, Release);
        }
    }

    loom::model(|| {
        let slots = Arc::new(Slots {
            head: AtomicU32::new(0),
            slots: AtomicPtr::new(std::ptr::null_mut()),
            lock: Mutex::new(()),
        });

        let slots1 = slots.clone();
        let t1 = thread::spawn(move || slots1.push());
        slots.push();
        t1.join().unwrap();
    });
}

Although the message is unclear, the loom actually "detects" something in this case. I wonder why this code is considered incorrect. Is this an imperfection of the loom, or it's a real problem according to the atomic model?

RustyYato · 2024-04-04T17:22:24Z

I think I'll try my hand at this, since there hasn't been any response in months. I can reproduce the bug on main,
and the conflict is between AtomicXXX::new and AtomicXXX::load, which should never conflict. (Especially because there is a Acquire-Release happens before relation via Slots::slots).

Let me see if I can figure out where loom is going wrong 😄

NOTE: adding fence(Acquire); after the load on slots, and fence(Release); before the store on slots does fix the bug, which shows that loom isn't correctly handling loads/stores here.

RustyYato · 2024-04-04T18:55:17Z

Ignore this

see my next comment, this one was based on the assumption that the example above was correct, which it isn't

I think tracking AtomicXXX::new as an unsync_mut is wrong, or this join is wrong

loom/src/rt/atomic.rs

Line 711 in 6a03bf0

self.unsync_mut_at.join(current);

Removing that one line, and all loom's tests still pass... but that doesn't seem right since this is basically just disabling the check all together

RustyYato · 2024-04-04T19:16:38Z

@loyd
Wait a minute,

let mut slots = self.slots.load(Relaxed); // `Acquire` fixes the issue <<<

This load does need to be Acquire... so it synchronizes with the store in try_alloc. So it looks like in this case loom did catch a bug correctly. It looks like you tried to implement double-checked locks but got the orderings flipped around. (see the reference in Wikipedia for C++ 11)

The load under the mutex can be Relaxed, but the one outside the mutex must be Acquire.

Not sure why I didn't see that sooner 🙃

loyd · 2024-04-05T10:11:55Z

@RustyYato, thanks for the quick response and I apologize for the time spent.

I'm not sure why this is the case or what can be done to disrupt the correctness of the code, but apparently, this is a real bug, and thanks to Loom for catching that. Anyway, it would be nice to have a better message instead of only created and load.

kvark mentioned this issue Jun 6, 2022

List accessors even without locations #279

Open

notgull mentioned this issue Nov 11, 2022

Add loom support smol-rs/event-listener#36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear "Concurrent load and mut accesses" with an atomic counter #260

Unclear "Concurrent load and mut accesses" with an atomic counter #260

kvark commented Apr 24, 2022

hawkw commented Apr 25, 2022

kvark commented Apr 26, 2022

hawkw commented Apr 26, 2022 •

edited

Loading

hawkw commented Apr 26, 2022 •

edited

Loading

kvark commented Apr 27, 2022

kvark commented Apr 27, 2022

hawkw commented May 3, 2022

hawkw commented May 3, 2022

hawkw commented May 4, 2022

RustyYato commented Aug 10, 2023 •

edited

Loading

loyd commented Apr 3, 2024

loyd commented Apr 4, 2024

RustyYato commented Apr 4, 2024 •

edited

Loading

RustyYato commented Apr 4, 2024 •

edited

Loading

RustyYato commented Apr 4, 2024 •

edited

Loading

loyd commented Apr 5, 2024

Unclear "Concurrent load and mut accesses" with an atomic counter #260

Unclear "Concurrent load and mut accesses" with an atomic counter #260

Comments

kvark commented Apr 24, 2022

hawkw commented Apr 25, 2022

kvark commented Apr 26, 2022

hawkw commented Apr 26, 2022 • edited Loading

hawkw commented Apr 26, 2022 • edited Loading

Footnotes

kvark commented Apr 27, 2022

kvark commented Apr 27, 2022

hawkw commented May 3, 2022

hawkw commented May 3, 2022

hawkw commented May 4, 2022

RustyYato commented Aug 10, 2023 • edited Loading

loyd commented Apr 3, 2024

loyd commented Apr 4, 2024

RustyYato commented Apr 4, 2024 • edited Loading

RustyYato commented Apr 4, 2024 • edited Loading

RustyYato commented Apr 4, 2024 • edited Loading

loyd commented Apr 5, 2024

hawkw commented Apr 26, 2022 •

edited

Loading

hawkw commented Apr 26, 2022 •

edited

Loading

RustyYato commented Aug 10, 2023 •

edited

Loading

RustyYato commented Apr 4, 2024 •

edited

Loading

RustyYato commented Apr 4, 2024 •

edited

Loading

RustyYato commented Apr 4, 2024 •

edited

Loading