Skip to content
This repository has been archived by the owner on Nov 5, 2018. It is now read-only.

Add AtomicCell #13

Merged
11 commits merged into from Jul 29, 2018
Merged

Add AtomicCell #13

11 commits merged into from Jul 29, 2018

Conversation

ghost
Copy link

@ghost ghost commented Mar 31, 2018

This type is a thread-safe version of Cell.

It is a good replacement for AtomicOption and is more general. The new equivalent of AtomicOption<T> would be AtomicCell<Option<Box<T>>>.

AtomicCell is also very similar to LockCell in rustc, except it uses atomic instructions whenever possible and doesn't embed any additional data in the object for locking.

Copy link
Contributor

@Amanieu Amanieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about adding these functions:

  • Logical operations: and, or, xor
  • Min/max: min, max (support for these will be added to Rust soon)

In a more general sense, do we want to have both AtomicCell<T> and Atomic<T> (from atomic-rs)? The main differences that I can see if that AtomicCell<T> supports non-Copy types, but only supports SeqCst orderings.

atomic_compare_and_swap(self.value.get(), current, new)
};

if byte_eq(&previous, &current) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using byte_eq here, you could just use the Result returned by compare_exchange_weak.

];

// If the modulus is a constant number, the compiler will use crazy math to transform this into
// a sequence of cheap arithmetic operations rather than using the slow modulo instruction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you actually check that the number you used (499) actually produces such a cheap instruction sequence?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the instruction sequence: https://godbolt.org/g/Pc6qoC

Copy link
Author

@ghost ghost Apr 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, FWIW, changing LEN to 512 doesn't affect the benchmark numbers at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to say that it is.. crazy and beautiful.


let mut step = 0usize;

while lock.compare_and_swap(false, true, Ordering::Acquire) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using compare_exchange_weak here will produce slightly more efficient code on ARM. Also the failure ordering can be Relaxed.

{
a = &*(dst as *const _ as *const _);
mem::transmute_copy(
&a.compare_and_swap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using the explicit compare_exchange or compare_exchange_weak operations rather than compare_and_swap. They return a Result which avoids the need to do a byte_eq afterwards.

/// If this value can be transmuted into a primitive atomic type, it will be treated as such.
/// Otherwise, all potentially concurrent operations on this data will be protected by a global
/// lock.
value: UnsafeCell<T>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would use alignment to round this up to the size of the next atomic type. This is what _Atomic in C11 does. However there is currently no way to express this in Rust.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Amanieu by "next atomic type" what do you mean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably that alignment/size of _Atomic char[3] is the same as alignment/size of _Atomic uint64_t (because that's how it gets represented internally).

/// ```
pub fn update<F>(&self, mut f: F) -> T
where
F: FnMut(T) -> T,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered making the function return an Option<T>? I find that this is very useful in practice. See this comment.

};

if byte_eq(&previous, &current) {
return new;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general convention for atomic operations is to return the previous value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if you change the return value of F to Option<T> then it would make more sense to return a bool here.

///
/// This type is equivalent to [`Cell`], except it can also be shared among multiple threads.
///
/// Operations on `AtomicCell`s use atomic instructions whenever possible, and synchronize using
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious -- can we avoid global locks? If so, do we prefer global locks to per-object locks?

I'd like to ask if we document a little bit more on this design choice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do that without sticking an AtomicBool next to absolutely every object, even if they don't use locking. Note that std::atomic in C++ is implemented in a similar manner.

In case an object does need locking, I'm not really sure whether we prefer global locks over per-object locks. It's a tradeoff between memory consumption and performance.

pub fn add(&self, val: $t) -> $t {
if can_transmute::<$t, atomic::AtomicUsize>() {
let a = unsafe { &*(self.value.get() as *const atomic::AtomicUsize) };
a.fetch_add(val as usize, Ordering::SeqCst).wrapping_add(val as usize) as $t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of using SeqCst in loads and stores. I described the reason here: rust-lang/rust#31650 (comment) In short, as contrary to commonly believed, using SeqCst for loads, stores, and rmws doesn't give you a useful guarantee over Release/Acquire semantics. In particular, it has never been proved that using SeqCst for a location gives you a perfect ordering for that particular location.

For AtomicCell, it seems Release and Acquire will be enough, I think.

Copy link
Author

@ghost ghost Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel I still haven't fully grasped the consequences of the broken C/C++11 model. :( Maybe we should take a look at an example. Here's one with 4 threads:

// THREAD 1
a.set(1);

// THREAD 2
b.set(1);

// THREAD 3
a.get(); // 1
b.get(); // 0

// THREAD 4
b.get(); // 1
a.get(); // 0

Now, two questions:

  1. Is this scenario possible if all get operations use Acquire and all set operations use Release? I would say "yes".

  2. Is this scenario possible if all get and all set operations use SeqCst? I would say "no" because the model prohibits this behavior. But you're saying the answer is "yes" because the compilers have no way of compiling this correctly on Power/ARM? Is my understanding correct?

@ghost
Copy link
Author

ghost commented Apr 6, 2018

I've rebased onto master and replaced atomic_compare_and_swap with atomic_compare_exchange_weak.

Benchmark results (operation on usize are lock-free, while operations on u8 use locks):

test add_u8                ... bench:           9 ns/iter (+/- 0)
test add_usize             ... bench:           6 ns/iter (+/- 0)
test compare_and_set_u8    ... bench:           9 ns/iter (+/- 0)
test compare_and_set_usize ... bench:           6 ns/iter (+/- 0)
test get_u8                ... bench:           9 ns/iter (+/- 0)
test get_usize             ... bench:           1 ns/iter (+/- 0)
test set_u8                ... bench:           9 ns/iter (+/- 0)
test set_usize             ... bench:           6 ns/iter (+/- 0)
test take_u8               ... bench:           8 ns/iter (+/- 0)
test take_usize            ... bench:           6 ns/iter (+/- 0)
test update_u8             ... bench:          18 ns/iter (+/- 1)
test update_usize          ... bench:          10 ns/iter (+/- 0)

@jeehoonkang

Regarding Acquire/Release vs SeqCst, can you answer my reply above to make sure we're on the same page on this matter?

@Amanieu

Have you considered making the function (update) return an Option<T>? I find that this is very useful in practice.

Yes. I'm mostly taking inspiration from Java here, where updateAndGet doesn't return an optional value.

My thinking on update is that it ought to be just a shorthand for a simple CAS loop in the most common case. For example, in order to atomically multiply a number by 5, we might write the following loop:

let new = loop {
    let x = a.get();
    let y = x * 5;
    if a.compare_and_set(x, y) {
        break y;
    }
};

This is not a lot of code, but writing the same thing in the form of this one-liner is much nicer:

let new = a.update(|x| x * 5);

Now, if the update function (the lambda) had to return an Option, that'd look like this:

let new = a.update(|x| Some(x * 5)).unwrap();

There are two unfortunate angles to this. First, the one-liner just became a bit longer. Second, we have this .unwrap() at the end, which makes you pause and think "wait, will unwrapping ever crash the program"? Unwraps always induce a small mental burden when proofreading code, at least IMO.

Consider that the original CAS loop didn't have unwrap at all, so having to do .unwrap() is now a regression. In that sense, it seems to me the whole purpose of update (for simple one-liners) is kind of defeated. I'd also wager that non-failing update functions are more common in practice and therefore worth spending more effort optimizing for.

Finally, even in the case of failing update functions, the improvement of update over a manual CAS loop is rather small:

// Using a CAS loop.
let new = loop {
    let x = a.get();
    if x == 0 {
        break None;
    } else {
        let y = 1000 / x;
        if a.compare_and_set(x, y) {
            break Some(y);
        }
    }
};

// Using `update`.
let new = a.update(|x| {
    if x == 0 {
        None
    } else {
        Some(1000 / x)
    }
}).unwrap();

In the end, it just seems to me the payoff is greater if the update function doesn't return an Option. What do you think?

The general convention for atomic operations is to return the previous value.

That is true, and the reason most likely lies in the fact that returning the previous value is strictly more powerful - one can obtain the new value by feeding the old one into the update function. Java has two variants: updateAndGet and getAndUpdate. I wonder what the definition would be if we had Cell::update, or something like it.

Again, my feeling is that using the new value is more commonly desirable, so manually recomputing it could be unergonomic and brittle:

let new = a.fetch_update(|x| x * 10) * 10;

Maybe we should just have two variants get_and_update and update_and_get? Maybe also update which doesn't return anything (returns ())? What's your opinion? :)

Have you thought about adding these functions:

Yes, we should eventually add those, too. I'm a bit stuck on the naming problem: get_and_and (the equivalent of fetch_and) looks bad so maybe we should drop the and in the middle. What do you think about the following method names?

get_update
update_get
get_add
add_get
get_sub
sub_get
get_and
and_get
get_or
or_get
get_max
max_get
...

@Amanieu
Copy link
Contributor

Amanieu commented Apr 6, 2018

@stjepang fetch_update on atomics just got added to Rust (along with fetch_min & fetch_max). That's what I was using as a reference.

Note that I was suggesting only having the result of the closure be an Option, not the actual return value of `update.

The rationale behind this is that in a lot of my code, I end up having loops that look like this:

loop {
    if some_condition {
        return false;
    }
    match atomic.compare_exchange_weak(
        oldval,
        newval,
        Ordering::Relaxed,
        Ordering::Relaxed,
    ) {
        Ok(_) => return true,
        Err(x) => oldval = x,
    }
}

I only want to update the value if some_condition is false. However in the case of AtomicCell this is going to be slow anyways due to excessive memory barriers (SeqCst), so I guess you could just write back the same value you got in if you don't want to change it.

@ghost
Copy link
Author

ghost commented Jul 25, 2018

Rebased onto master and:

  • Added Send and Sync impls for AtomicCell<T>.
  • Removed update.
  • Changed add and sub to get_add and get_sub.

I removed method update because neither AtomicUsize nor Cell have them stabilized yet and their signatures differ. It's not a critical method and we can always add update later.

@jeehoonkang Regarding memory orderings, perhaps let's leave them as strong as possible now and relax in a future PR?

@Amanieu How do you feel about adding get_and, get_nand, get_or, get_xor, and then merging this PR? Are there any other outstanding issues that need to be resolved?

/// let ptr = a.as_ptr();
/// ```
pub fn as_ptr(&self) -> *mut T {
self.value.get()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this return a *const T pointer? Since its only a reference to itself, not a mutable one.

Also shouldn't this be an unsafe function? If users threats this as a regular pointer, while it's being shared across threads, suprising behaviour might happen from non-atomic/non-locked operations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I'll remove this function as it doesn't make much sense for atomic types (Cell<T> has it, but not AtomicUsize).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Thomasdezeeuw Probably it's okay not to mark it unsafe because using *mut T and *const T requires unsafe. Anyway users should make their code unsafe, so I think both *const T and *mut T are fine.

Copy link
Collaborator

@jeehoonkang jeehoonkang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned with orderings. I think SeqCst obscures the specification: what do we expect from the methods of AtomicCell? I think the word "sequentially consistent" doesn't really help, because practically none understand the term precisely. Even worse, using SeqCst may guarantee some ordering properties which users will rely on, after which we're stuck and cannot optimize it further.

So I suggest that, even we're using SeqCst here, let's make clear that the specification of AtomicCell is unclear. (Oh, this sentence is too pedantic... Sorry.) And in a later PR presumably we will confirm that its specification is actually release/acquire.

Also on the global locks: I agree global locks are the only sane choice we can use now. I still prefer per-object locking to global locking because I think it better conveys programmers' intent, but sadly there's no way to implement per-object locking efficiently. In an ideal world, I hope atomic_is_lock_free() be a const function, and we can use its results in at compile time so that lock-free AtomicCell<T> doesn't have a lock and non-lock-free AtomicCell<T> has one.

/// If this value can be transmuted into a primitive atomic type, it will be treated as such.
/// Otherwise, all potentially concurrent operations on this data will be protected by a global
/// lock.
value: UnsafeCell<T>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Amanieu by "next atomic type" what do you mean?

/// let ptr = a.as_ptr();
/// ```
pub fn as_ptr(&self) -> *mut T {
self.value.get()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Thomasdezeeuw Probably it's okay not to mark it unsafe because using *mut T and *const T requires unsafe. Anyway users should make their code unsafe, so I think both *const T and *mut T are fine.

/// assert_eq!(a.compare_and_set(Foo(11), Foo(12)), true);
/// assert_eq!(a.get().0, 12);
/// ```
pub fn compare_and_set(&self, mut current: T, new: T) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding its signature: how about returning Result<T, T>, where Ok(cur) means CAS succeeded and Err(cur) means CAS was unsuccessful?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think we have three options here:

fn compare_and_set(&self, current: T, new: T) -> bool;
fn compare_and_swap(&self, current: T, new: T) -> T;
fn compare_exchange(&self, current: T, new: T) -> Result<T, T>;

Maybe let's get rid of compare_and_set and implement the last two since those are already established idioms?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm concerned, I think even having only compare_exchange() is also fine :)

I'll just leave the decision to you.

}

// The compare-exchange operation has failed and didn't store `new`. However,
// `previous` is semantically equal to `current`, although not byte-equal.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can also be byte-equal, as we're using the weak version of CAS here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I've updated the comment.

];

// If the modulus is a constant number, the compiler will use crazy math to transform this into
// a sequence of cheap arithmetic operations rather than using the slow modulo instruction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to say that it is.. crazy and beautiful.

while lock.compare_exchange_weak(false, true, Ordering::Acquire, Ordering::Relaxed).is_err() {
if step < 5 {
// Just try again.
} else if step < 10 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it's a good spot to use Backoff :)

@ghost
Copy link
Author

ghost commented Jul 25, 2018

@jeehoonkang Regarding SeqCst vs Acquire/Release:

I guess my somewhat hand-wavy definition would go like this: AtomicCell<usize> is equivalent to AtomicUsize where all operations use the strongest possible ordering, i.e. SeqCst. Put differently, the idea is that the user can reason about AtomicCell as "AtomicCell is just like std::sync::atomic::Atomic* with SeqCst as the default ordering for all operations". I wouldn't make any guarantees beside that.

To me, the fact that SeqCst is not understood precisely is an orthogonal issue. What matters is that it's the strongest ordering we have and is typically recommended as the safest choice (when in doubt, use SeqCst) so it makes sense to use the strongest possible ordering here.

@ghost
Copy link
Author

ghost commented Jul 26, 2018

Just one more question: do you prefer load/store/swap/fetch_add or set/get/replace/get_add for AtomicCell?

So AtomicCell is a crossover of Atomic{Usize,...} and Cell types, but it currently uses Cell's terminology. A similar situation in the standard library is RwLock and RefCell (basically the same thing except one is thread-safe and the other isn't), where RwLock uses read/write and RefCell uses borrow/borrow_mut.

@jeehoonkang
Copy link
Collaborator

@stjepang I'm concerned with documentation, actually. It says its semantics is sequentially consistent, but I intend to propose to weaken it to release/acquire in the future. So I'm asking to leave the documentation somewhat unspecified at this moment.

@ghost
Copy link
Author

ghost commented Jul 26, 2018

Oh, I see. Is the documentation all right now?

Copy link
Collaborator

@jeehoonkang jeehoonkang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on AtomicCell! I think it'll be very useful.

I'm still concerned with the comment "This operation uses the SeqCst ordering", it kinda looks like a spec, but let's resolve it in a later PR...

@ghost
Copy link
Author

ghost commented Jul 26, 2018

The doc comment applies only to private functions so we haven't actually commited to it.

@ghost
Copy link
Author

ghost commented Jul 26, 2018

Some changes:

  • Rebased.
  • Changed compare_and_set to compare_exchange.
  • Added compare_and_swap.
  • Renamed get/set/replace/get_* to load/store/swap/fetch_*.
  • Removed take since atomic types typically don't have it.
  • Implemented the rest of fetch_ methods.

@ghost
Copy link
Author

ghost commented Jul 26, 2018

@Amanieu @jeehoonkang Are you OK with merging with these changes?

@jeehoonkang
Copy link
Collaborator

It's wonderful. Thanks!!

@ghost ghost merged commit 6756505 into crossbeam-rs:master Jul 29, 2018
@ghost ghost deleted the atomic-cell branch July 29, 2018 15:54
@ghost ghost mentioned this pull request Sep 1, 2018
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants