RFC: Reduce atomics macro API contracts for future extensions #41269

tkf · 2021-06-18T06:06:39Z

Currently, @atomicreplace a[1].x expected => desired is equivalent to

tmp = a[1]
@atomicreplace tmp.x expected => desired

However, it forbids future extensions such as lowering

@assert sizeof(K) == sizeof(V) == sizeof(UInt64)
xs = Vector{@NamedTuple{key::K, value::V}}(...)
k = @atomic xs[i].key

to a 64 bit (not 128 bit) load.

As I explained in #37847, this kind of "torn read" is a common technique when dealing with 128 bit entries (e.g., https://doi.org/10.1145/3309206 p.6 and https://doi.org/10.1145/2442516.2442527 Fig. 3, line 37). It looks like this is discussed as "Tearable Atomics" in C++ http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0690r1.html

Other expressions such as x.y.z and f(x).y also have a possible well-defined interpretation as a memory location using the notion of lens a la Accessors.jl.

The point of this PR is not for discussing if the above suggestions are adequate (although I understand that dismissing them as out of the scope of Julia is an unfortunate but possible outcome).

Rather, I suggest avoid expanding API contracts prematurely and leaving these expressions unsupported, for possible extensions in the future.

vtjnash · 2021-06-18T13:55:48Z

However, it forbids future extensions such as lowering
@assert sizeof(K) == sizeof(V) == sizeof(UInt64)
xs = Vector{@NamedTuple{key::K, value::V}}(...)
k = @atomic xs[i].key

It forbids it because those are not quite equivalent syntactical expressions, not because that is invalid to use a 64-byte load. In particular, according to the C memory model, it sounds like you are thinking of the definition of volatile (which constrains the number of bytes loaded, and is not exposed in Julia), which is distinct from atomic (which constrains the happens-before ordering of the bytes loaded, and is the operation defined here).

As I also asserted in #37847, this seems likely already legal to describe a 64-byte atomic load:

k = (@atomic xs[i]).key

tkf · 2021-06-18T22:26:04Z

I don't think volatile is relevant because volatile in C/C++ sense is nothing to do with atomics.

If you meant to point it out that this is not incorporated in the C/C++ memory model yet, I agree. That's why I was linking the ongoing discussion in C++ committee. But my point is that it is a conceivable extension and the current API does not allow adding this later in a non-breaking manner.

k = (@atomic xs[i]).key

This a 128 bit load followed by field extraction of the first 64 bit component. It requires cmpxchg16b in x86-64 https://godbolt.org/z/Mb3GTjzGj even for relaxed load.

It forbids it because those are not quite equivalent syntactical expressions

Expressions inside a macro can (and very commonly does) have non-standard semantics. I think it's reasonable to keep the opportunity to provide concise expressions for the useful atomic operations.

vtjnash · 2021-06-19T00:57:04Z

Huh, that seems like an LLVM problem. Even for

  %unused = load atomic i64, i64* %2 unordered, align 8

it refuses to eliminate that dead load instruction. But there is no happens-before or other ordering (it is even nosync) that requires it to exist.

tkf · 2021-06-20T02:26:02Z

The C++ proposal I linked seems to insist on that this is not optimizable

struct alignas(sizeof(intptr_t) * 2) Node {
    intptr_t l, r;
};

Node action(const Node& old);

void do_action(std::atomic<Node> *n) {
    Node old = n->load(std::memory_order_relaxed); // Relaxed loads can’t tear.
    while (!n->compare_exchange_weak(old, action(old), std::memory_order_release,
                                     std::memory_order_acquire))
      ;
}
};
In this example, all lock-free operations (including load / store) must be implemented as a compare-and-exchange or load-linked / store-conditional

--- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0690r1.html

But re-reading your comments makes me realize that I don't understand exactly why this is the case.

vtjnash · 2021-06-20T05:52:23Z

The c++ proposal explains at the bottom that this is not a valid optimization (or in fact a valid portion of the proposal) since action(old) is unspecified. If that was pure and didn't dereference the argument, the optimization would be valid without the proposed changes to the spec. The other part of the document (seqlock) is more interesting, as it may provide an improvement to the full lock used currently, and might not be clear how to express currently (particularly to TSan).

I also thought aligned AVX instructions would probably give a non-tearable read, but I can understand the reluctance to assume that is promised architecturally, and to instead use cx16.

tkf · 2021-06-20T07:21:13Z

Thanks. So, in principle, it seems like the compiler can optimize out cmpxchg16b from the load if I write this?

intptr_t old_l = n->load(std::memory_order_relaxed).l;
intptr_t old_r = n->load(std::memory_order_relaxed).r;
Node old(old_l, old_r);

But what about tearable stores? It seems hard to express tearable stores

@atomic :monotonic xs[i].key = key
@atomic :monotonic xs[i].value = value

by

p0 = @atomic :monotonic xs[i]
p1 = (key = key, value = p0.value)
@atomic :monotonic xs[i] = p1
p2 = @atomic :monotonic xs[i]
p3 = (key = p2.key, value = value)
@atomic :monotonic xs[i] = p2

(expecting the optimizations).

This is used in the work-stealing deque example. The C++ proposal mentions that the performance advantage "is more likely to be significant" (sounds easy to expect).

But, actually, going back to the main topic in the OP, if we have the direct way to express tearable atomics like

@atomic :tearable xs[i] = (; key, value)  # or `:nonatomic` as in C++

then arguably it's cleaner than assigning .key and .value separately anyway. Assuming the solution in the C++ will be something like p0690r1, maybe we don't need to worry about the point I was raising in the OP.

Maybe a more problematic part is that nonatomic and our LLVM-based naming :notatomic are easy to mix up, if C++ committee go with the plan in p0690r1.

vtjnash · 2022-03-11T19:03:59Z

This would be a breaking change now

Allow only shallow field accesses in atomic macros

00a2948

tkf added the multithreading label Jun 18, 2021

vtjnash added backport 1.7 design labels Jun 19, 2021

KristofferC mentioned this pull request Jun 26, 2021

release-1.7: Backports for 1.7-RC1/1.7-beta3 #41382

Merged

51 tasks

KristofferC mentioned this pull request Jul 7, 2021

release-1.7: Backports for 1.7-RC1/1.7-beta4 #41499

Merged

82 tasks

KristofferC mentioned this pull request Aug 25, 2021

release-1.7: Backports for 1.7-RC1 #41781

Merged

63 tasks

vtjnash closed this Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Reduce atomics macro API contracts for future extensions #41269

RFC: Reduce atomics macro API contracts for future extensions #41269

tkf commented Jun 18, 2021

vtjnash commented Jun 18, 2021

tkf commented Jun 18, 2021

vtjnash commented Jun 19, 2021

tkf commented Jun 20, 2021

vtjnash commented Jun 20, 2021

tkf commented Jun 20, 2021

vtjnash commented Mar 11, 2022

RFC: Reduce atomics macro API contracts for future extensions #41269

RFC: Reduce atomics macro API contracts for future extensions #41269

Conversation

tkf commented Jun 18, 2021

vtjnash commented Jun 18, 2021

tkf commented Jun 18, 2021

vtjnash commented Jun 19, 2021

tkf commented Jun 20, 2021

vtjnash commented Jun 20, 2021

tkf commented Jun 20, 2021

vtjnash commented Mar 11, 2022