Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync::Once use release-acquire access modes #52349

Merged
merged 1 commit into from
Jul 20, 2018
Merged

Conversation

RalfJung
Copy link
Member

Nothing here makes a case distinction like "this happened before OR after that". All we need is to get happens-before edges whenever we see that the state/signal has been changed. Release-acquire is good enough for that.

@rust-highfive
Copy link
Collaborator

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 13, 2018
@@ -323,7 +323,7 @@ impl Once {
POISONED |
INCOMPLETE => {
let old = self.state.compare_and_swap(state, RUNNING,
Ordering::SeqCst);
Ordering::AcqRel);
Copy link
Member

@nagisa nagisa Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could use compare_exchange and a Ordering::Relaxed for the failure ordering, I believe. Do you have a way to verify/check that?

Copy link
Member Author

@RalfJung RalfJung Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could try to bribe Hai or @jhjourdan into proving it. ;) But that's likely not going to happen soon. Personally I will not use Relaxed anywhere without being really sure.

@gereeter
Copy link
Contributor

gereeter commented Jul 14, 2018

cc #27610 and previous attempts #31650 and #44331.

The conclusions there were to keep everything SeqCst since SeqCst loads (all that occur on the fast path) happen to be fast on most architectures and it is more likely to be correct. It may be more palatable now, however, in the face of the weak memory RustBelt work.

I personally strongly disagree with this decision. I think that Acquire/Release are more readable since they can just be ignored if necessary for intuition and act as a guide for a more formal proof of correctness. I find the safety gain of SeqCst meaningless because ensuring correctness requires careful reasoning, which is generally best done using transfers of ownership effected by Acquire/Release pairs, not by considering possible interleavings. Basically, I find @RalfJung's comments here and here spot on.

@sfackler
Copy link
Member

r? @alexcrichton

@RalfJung
Copy link
Member Author

It may be more palatable now, however, in the face of the weak memory RustBelt work.

Notice that I am not directly involved in this work, and they have not looked at sync::Once (yet).

@alexcrichton
Copy link
Member

FWIW I'm at least personally still a novice with atomic orderings which (as mentioned by @gereeter) in previous threads is why I've proposed for leaving this as SeqCst. When dealing with concurrent code I'm at least personally trained to think about interleavings, and that to me at least does seem like the least common denominator when working with concurrent code. As @gereeter also mentioned, this I think is a reason to use SeqCst as aggressively as possible. You simply can't go wrong if you're already doing the baseling work of considering interleavings.

@RalfJung you mentioned on Reddit that SeqCst is more complicated than acquire/release, but that's along the lines of "proving you need SeqCst", right? For just using them, though, do you agree that SeqCst is the "safest choice"?

One question I'd have about this patch as-is is that it seems to differ from #44331 in a few orderings. Specifically in Drop for Finish this PR uses AcqRel while #44331 uses Release. Do we know which is correct? Was Release incorrect?

I completely agree that SeqCst is overly powerful in cases like this. This is why previously I've asked for benchmarks or information about how this provides a faster primitive. If it's significant winnings then I definitely think it's worth the tradeoff of harder-to-understand code. In particular with the case of Once there's only really one atomic ordering that matters which is the initial load as all the others are on the slow path.

Now if we literally have an academic proof for the orderings being correct I'm all for switching to those orderings! Before that, though, I'm personally hesitant to do so without motivation in terms of benchmarks. I certainly trust @RalfJung to get concurrent orderings more correct than myself though!

@nagisa
Copy link
Member

nagisa commented Jul 15, 2018

Specifically in Drop for Finish this PR uses AcqRel while #44331 uses Release. Do we know which is correct? Was Release incorrect?

AcqRel will pick either Acquire or Release whichever is appropriate for the operation (load or store, respectively). So both changes are equivalent in that particular snippet of code.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 15, 2018

I just noticed Jehoon did that previous PR. He certainly knows more about weak memory than I do.^^

AcqRel will pick either Acquire or Release whichever is appropriate for the operation (load or store, respectively). So both changes are equivalent in that particular snippet of code.

I don't think that is correct, e.g. AcqRel is invalid for loads:

AcqRel => panic!("there is no such thing as an acquire/release load"),

AcqRel is used for read-modify-write operations to make the read acquire, and the write release. That other PR uses a weaker mode that just makes the write release, but makes the read relaxed.

However, I wish what you said was true because then one could just use AcqRel for all operations and would not have to worry about these details...

@nagisa
Copy link
Member

nagisa commented Jul 15, 2018

Oh, I was looking at the wrong op (the store instead of the swap). So yeah, I’m wrong about the equivalency here.

Yeah… AcqRel on regular stores and loads would be great.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 15, 2018

Yeah… AcqRel on regular stores and loads would be great.

Would that need an RFC or is a PR enough? It'd be insta-stable... so I'd guess it needs an RFC.

@nagisa
Copy link
Member

nagisa commented Jul 15, 2018

Since both C++ and LLVM do not permit it, there might be a good reason for that. I would be somewhat reluctant to land such change with just a PR.

@RalfJung
Copy link
Member Author

@alexcrichton

@RalfJung you mentioned on Reddit that SeqCst is more complicated than acquire/release, but that's along the lines of "proving you need SeqCst", right? For just using them, though, do you agree that SeqCst is the "safest choice"?

Certainly, whenever release-acquire is correct, then SeqCst is also correct. So yes, it is the "safest choice".

Specifically in Drop for Finish this PR uses AcqRel while #44331 uses Release. Do we know which is correct? Was Release incorrect?

I trust @jeehoonkang got this right. Release here means that in the following code, we cannot assume to actually see writes that happened before whatever write the swap reads from.

In particular with the case of Once there's only really one atomic ordering that matters which is the initial load as all the others are on the slow path.

Shall I change this PR to only make that an Acquire, and add a comment that we're just not bothering with the rest?

Now if we literally have an academic proof for the orderings being correct I'm all for switching to those orderings! Before that, though, I'm personally hesitant to do so without motivation in terms of benchmarks.

Fair enough. Maybe we can get @jhjourdan or Hai (don't know his GitHub handle) to do a proof. ;)

@jeehoonkang
Copy link
Contributor

jeehoonkang commented Jul 16, 2018

I think there are two different schools in concurrent programming: what I call the "bullet-proof" school and the "logical proof" school.

In the bullet-proof school, people know that concurrent programming is hard, and in order to make it right, they don't optimize things when not 100% sure. The meaning of "100%" may vary among people: someone thinks a lot of tests; another thinks no fine-grained concurrency; maybe SeqCst-only fine-grained concurrency; and even sometimes 100% means "never." So different projects adopt aggressive, fine-grained concurrency in different degrees, depending on the stability and performance requirement of the projects.

On the other hand, in the logical proof school, people know that concurrent programming is hard, but thanks to the recent advances of reasoning principles, it is no longer a beast. Iris and LambdaRust from @drdreyer's group (where @RalfJung is!) are notable examples of such reasoning principles, on which you can prove why certain fine-grained concurrency patterns are correct. Once an implementation is proved, we can believe the result and fearlessly deploy it.

I'm personally an advocate for the logical proof school, and I believe so is @RalfJung. That's why we're constantly proposing to lower orderings in Rust std: e.g. we believe we "know" the logical proof of Once, and it relies on release/acquire synchronization, not the interleaving nature of SeqCst. So using release/acquire is not only more performant (at least in some hardware) but also is conveying the designer's intent more clearly. Of course you may disagree with us if you don't believe the proof; anyway, we're talking about a very core foundation library of a real-world language, and we cannot be too cautious!

Probably this disagreement comes from the fact that the logical proof school has not finished its job yet: while we at academia have produced quite solid results so far, based on which I'm quite confident that using release/acquire is safe in sync::Once in particular, we still don't have a good concurrency semantics for C/C++; and we still don't have a good reasoning principle (e.g. program logic) that scales up to real-world C/C++ data structures (I know there are some unpublished results, but.. :) I don't want to ask the Rust community to be an early adopter of those results, and that's why I agreed to close #44331. And unfortunately, I think the situation has not changed so far. But thanks to the recent progress, I think we are almost "getting there" 😄

@RalfJung
Copy link
Member Author

RalfJung commented Jul 16, 2018

Well put.

I was not aware that there had been previous proposals to change the synchronization mode when I opened this PR -- so it seems likely to at least be a good idea to add a comment that prevents more people from repeating the same thing in the future. ;)

@alexcrichton
Copy link
Member

Indeed well put! And to be clear, I'd be totally comfortable with a proof. @RalfJung's work and all have already found multiple bugs in libstd through proofs, so I have a good deal of confidence in them! I'm under the impression though that one doesn't exist for Once yet, though, which is why I'm hesitant to change this here.

@jeehoonkang heh I think I would definitely classify myself in the "bullet-proof" school in that case! I definitely do agree that using release/acquire can convey intent more clearly, but my worry is that I will convey myself incorrectly because I don't fully understand release/acquire. In that sense I know I can't miscommuniate if I always use SeqCst, I just communicate too much :)

@RalfJung I'd be totally down with some comments in the code about "usage of SeqCst is way too strong here, pending more work to change to release/acquire". Additionally if y'all feel that Acquire is indeed 100% correct for the first initial load I would be ok changing that. Note that, tested long ago, there is no difference in codegen between Acquire and SeqCst on either x86, arm, or aarch64. Only on powerpc was it found to generate a difference. This is definitely my biggest hesitation in the sense that in my mind we're trading "only worry about interleavings" for "worry about interleavings and atomic orderings, and it'll be faster on platforms that don't see much benchmarking anyway". (in that the possible gains for the increase in difficulty seem small as measured so far)

@hanna-kruppe
Copy link
Contributor

Note that, tested long ago, there is no difference in codegen between Acquire and SeqCst on either x86, arm, or aarch64.

Release stores, however, do have different codegen even on x86. From my understanding, this is unlikely to matter for Once since there's no stores on the hot path, but it's still useful to keep in mind. No architecture of interest provides SC for free, not even those with the relatively strong TSO model.

@RalfJung
Copy link
Member Author

I'm under the impression though that one doesn't exist for Once yet, though, which is why I'm hesitant to change this here.

Correct, there is no proof for Once.

I have sense then browser some more placed in libstd and seen that SeqCst is indeed used consistently and not necessary in most cases. Even Arc contains some SeqCst that probably nobody bothered to optimize. That makes me wonder whether Once is the right place for such a comment. OTOH, given that this is the third PR doing the same thing, that seems like an indication. ;)

Note that, tested long ago, there is no difference in codegen between Acquire and SeqCst on either x86, arm, or aarch64. Only on powerpc was it found to generate a difference.

I am really surprised that there is no codegen difference for the ARM architectures -- but indeed, this website lists "ldr; dmb ish" as codegen for both SeqCst and Acquire loads. (There's an alternative for acquire loads, but it seems LLVM does not consider that worth it.)

@drdreyer
Copy link

Correct, there is no proof for Once.

Actually, Hai and I talked about it today. We will see if it is possible to do a proof for Once without too much effort. The main technical issue has to do with the fact that there are comparisons of pointer equality, which means we have to show that any pointer we could possibly compare for equality has not been deallocated. Should be doable, but may be tricky. I don't think it has anything to do with weak memory.

@RalfJung
Copy link
Member Author

we have to show that any pointer we could possibly compare for equality has not been deallocated

So it is not enough to rely on the non-determinism in case the pointer has been deallocated?

@RalfJung
Copy link
Member Author

Meanwhile, I have reduced this PR to just do Acquire on the hot path and explain why we don't bother for the rest of the code.

@drdreyer
Copy link

So it is not enough to rely on the non-determinism in case the pointer has been deallocated?

Well, the problem is that if you do a CAS on a location that might be storing a deallocated pointer, then it might succeed even though the things being compared are not equal, and I don't know how you would reason about that. So I think you need to arrange for the location's invariant to store some knowledge that its contents are not a deallocated pointer.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Jul 18, 2018

📌 Commit 3e1254d has been approved by alexcrichton

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 18, 2018
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jul 18, 2018
@RalfJung
Copy link
Member Author

@nagisa

Yeah… AcqRel on regular stores and loads would be great.

RFC submitted: rust-lang/rfcs#2503

@bors
Copy link
Contributor

bors commented Jul 20, 2018

⌛ Testing commit 3e1254d with merge bc14d71...

bors added a commit that referenced this pull request Jul 20, 2018
sync::Once use release-acquire access modes

Nothing here makes a case distinction like "this happened before OR after that". All we need is to get happens-before edges whenever we see that the state/signal has been changed. Release-acquire is good enough for that.
@bors
Copy link
Contributor

bors commented Jul 20, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing bc14d71 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants