UB due to lack of a full fence in full_fence (on x86) #70

RalfJung · 2023-08-13T13:54:28Z

I was recently made aware of this code:

Lines 562 to 577 in 0ea4641

    
           // HACK(stjepang): On x86 architectures there are two different ways of executing 
        
           // a `SeqCst` fence. 
        
           // 
        
           // 1. `atomic::fence(SeqCst)`, which compiles into a `mfence` instruction. 
        
           // 2. `_.compare_exchange(_, _, SeqCst, SeqCst)`, which compiles into a `lock cmpxchg` instruction. 
        
           // 
        
           // Both instructions have the effect of a full barrier, but empirical benchmarks have shown 
        
           // that the second one is sometimes a bit faster. 
        
           // 
        
           // The ideal solution here would be to use inline assembly, but we're instead creating a 
        
           // temporary atomic variable and compare-and-exchanging its value. No sane compiler to 
        
           // x86 platforms is going to optimize this away. 
        
           atomic::compiler_fence(Ordering::SeqCst); 
        
           let a = AtomicUsize::new(0); 
        
           let _ = a.compare_exchange(0, 1, Ordering::SeqCst, Ordering::SeqCst); 
        
           atomic::compiler_fence(Ordering::SeqCst);

As the comment says, this is UB -- and the myth that no sane compiler is going to optimize atomics is a myth. However, I admit I don't know enough about this specific case to say whether this is a risk for this crate.

What I am wondering, why doesn't this use inline assembly? That is the intended mechanism to force the compiler to generate code in a particular way, and it is clearly sound here. The alternative is even mentioned in the comment, but unfortunately without an explanation of why it was discarded.

notgull · 2023-08-13T14:50:20Z

The MSRV of the smol-rs project previously did not allow for inline assembly. See smol-rs/smol#244. But now it does, so we should probably bump the MSRV so we can directly use the code that we want.

taiki-e · 2023-08-13T15:29:30Z

Filed #71 to resolve this.

why doesn't this use inline assembly?

When this crate was first released, inline assembly was not stable. And for a long time after that, we used MSRV that older than 1.59.

Since we have recently raised the MSRV of some smol-rs crates to 1.63, I think this is an appropriate time to address this issue.

and the myth that no sane compiler is going to optimize atomics is a myth. However, I admit I don't know enough about this specific case to say whether this is a risk for this crate.

The C++20 memory model states that the SC operation "establish a single total modification order of all atomic operations that are so tagged". https://en.cppreference.com/w/cpp/atomic/memory_order#Sequentially-consistent_ordering

While it may be possible to "merge two atomic operations when there are only pure operations in the middle" as described in the article you linked to, I don't think it is allowed to completely remove independent SC operations (unless the compiler can prove that there are never any SC operations that could be synchronized with it by inspecting all programs). (It may be possible to replace SC CAS with SC fence or other SC RMW, but it doesn't lose the property we need here.)

So, unless the C++ memory model changes the property mentioned above in the future, I expect that no wrong code will be generated here, but inline assembly is the "definitely correct" choice here, and I agree that using inline assembly is definitely preferable.

RalfJung · 2023-08-13T19:16:52Z

The MSRV of the smol-rs project previously did not allow for inline assembly. See smol-rs/smol#244. But now it does, so we should probably bump the MSRV so we can directly use the code that we want.

Ah, if it's just an MSRV issue that is great! And thanks @taiki-e for getting the ball rolling with that PR. :)

If the lock cmpxchg can be faster than mfence it might also be worth trying to get LLVM to generate the former instead of the latter for SC fences -- then all code would benefit.

The C++20 memory model states that the SC operation "establish a single total modification order of all atomic operations that are so tagged". https://en.cppreference.com/w/cpp/atomic/memory_order#Sequentially-consistent_ordering

Yeah, but the way that order interacts with the other orders isn't always entirely how one would expect, I think... so it's certainly not obvious to me that compilers are not allowed to drop the access. In particular this order AFAIK does not participate when determining synchronizes-with edges -- but an SC fence has the effects of a release-acquire fence and can hence establish synchronizes-with edges.

You have to study the formalizations of this model, I don't think the English prose can be precise enough for subtle questions like this.

RalfJung · 2023-08-13T21:20:50Z

I don't think it is allowed to completely remove independent SC operations

LLVM certainly does not agree with your reasoning: this becomes a NOP.

pub fn test() {
    let x = AtomicUsize::new(0);
    x.load(Ordering::SeqCst);
}

Stores also get removed. I think they just didn't implement the pass that removes dead RMWs.

taiki-e · 2023-08-13T21:55:47Z

Oh, that is interesting. I can reproduce it, even with a very old rustc, even if I used compiler_fence around SC load/store. https://godbolt.org/z/xsvG5Ke58

taiki-e mentioned this issue Aug 13, 2023

Use inline assembly in full_fence #71

Merged

notgull closed this as completed in #71 Aug 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UB due to lack of a full fence in full_fence (on x86) #70

UB due to lack of a full fence in full_fence (on x86) #70

RalfJung commented Aug 13, 2023

notgull commented Aug 13, 2023

taiki-e commented Aug 13, 2023 •

edited

Loading

RalfJung commented Aug 13, 2023 •

edited

Loading

RalfJung commented Aug 13, 2023 •

edited

Loading

taiki-e commented Aug 13, 2023

UB due to lack of a full fence in full_fence (on x86) #70

UB due to lack of a full fence in full_fence (on x86) #70

Comments

RalfJung commented Aug 13, 2023

notgull commented Aug 13, 2023

taiki-e commented Aug 13, 2023 • edited Loading

RalfJung commented Aug 13, 2023 • edited Loading

RalfJung commented Aug 13, 2023 • edited Loading

taiki-e commented Aug 13, 2023

taiki-e commented Aug 13, 2023 •

edited

Loading

RalfJung commented Aug 13, 2023 •

edited

Loading

RalfJung commented Aug 13, 2023 •

edited

Loading