-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArcCell: wait-free get() #81
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice, maybe we should extend it with a cas-like function too.
debug_assert_eq!(self.0.load(Ordering::SeqCst), 0); | ||
self.0.store(unsafe { mem::transmute(t) }, Ordering::Release); | ||
/// Create a new `ArcCell` from the given `Arc` interior. | ||
pub fn with_val(v: T) -> ArcCell<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no reason to shorten value
to val
.
I was looking at this again today and was wondering if it makes sense to set the high bit if the writer can't acquire the lock after a while. A reader seeing that should backoff. |
If you already have an Arc then isn't what you want to implement https://en.wikipedia.org/wiki/Read-copy-update ? |
@sstewartgallus RCU requires a memory management scheme to go along with the read-copy-update part. Imagine that you load the Arc ptr, stall, somebody drops the arc, and then resume trying to acquire the pointer. The Linux kernel scheme for doing so is often referred to RCU as a whole. Crossbeam EBR could also be used to manage the memory. This is managing it independently of the EBR with what's essentially a second refcounting scheme. The orderings here are subtly incorrect. In get, the SeqCst load doesn't prevent the refcount increment from reordering past it, see generated code here. SeqCst on a load is essentially an acquire load that's also ordered with respect to all other SeqCst operations and observe a global order of all SeqCst operations. In an ideal world an Acquire on the fetch_add should work, but on aarch64 the load could still reorder behind the store part of the ll/sc pair (it's basically load-linked-acquire->store-conditional). aarch64 rmw orderings are extremely screwy and a major stretch of the c++11 atomic memory model, but oh well. I guess acquire only applies to the load ordering technically. I'm not sure of a performance-friendly way to do this without platform specific code. |
old | ||
unsafe { | ||
let t: usize = mem::transmute(t); | ||
let old: Arc<T> = mem::transmute(self.ptr.swap(t, Ordering::Acquire)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ordering is subtly incorrect, see my comment in the main thread for why. There's a crate atomic_utilities that has a fence/ordering combo that will create an optimal rmw setup that is ordered-before future loads and stores. I'm adding docs to it now but there's an example in the code.
@schets I was checking this thread, what's wrong with the linked code? I get
I know little about arm64 assembly but it doesn't look unordered to me. |
@arthurprs ldar prevents memory accesses after the load from happening before it but doesn't do anything about memory accesses before. Arm also allows loads to speculatively move before branches and loads are allowed to move before an stxr (it can return true before the store is finished, meaning speculative execution isn't required for this problem to manifest). Given all that, it's possible for the ldar instruction to be ordered-before the stxr and I'm pretty sure the ldxr instruction, so it's possible for the SeqCst load to happen partially or completely before the fetch_add/swap. That's definitely a problem in get(), and I believe a problem in set as well (less clear on that one). |
Ahh I misunderstood the "reorder", you meant inside the execution units. Just like I say, my arm64 skills suck. You suggest using this, right? https://docs.rs/crate/atomic_utilities/0.5.0/source/src/fence_rmw.rs |
That's what I've used for similar cases. It basically does:
|
More general types have been proposed in crossbeam-rs/rfcs#28. |
As an idea I got from @schets in #80, I wrote a slightly less locking implementation of
ArcCell
. In this version:get()
is always wait-free.set()
is wait-free with respect to otherset()
s. Whenever anyget()
arrives, theset()
will block and have to wait until no concurrentget()
s are active, behaving like the original implementation would in the degenerate case of the spinlock always going toget()
s.Since this will scale better for many readers, but most likely worse for many writers, I may also PR this as a new type, even though the change is backwards-compatible in terms of behaviour.