-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync/atomic: add OR/AND operators for unsigned types #61395
Comments
setPresent should not be calling runtime.Gosched. There's nothing another goroutine is going to do to help move things along (unlike, say, waiting for a mutex to be unlocked). If you remove the Gosched, it should be inlinable. At least, -m says this is inlinable:
I wrote this in runtime/atomic_test.go:
The result is:
So this would be an x86-only optimization that wins less than 2X. I'm not sure this really makes sense given how unusual it typically is. The main argument I can see is inlinability, but it seems to be inlinable already if written efficiently. |
I was wondering about
A 2x win may still be significant, but I'll know more when I add a better benchmarking setup. But also I found LDSET/LDSETA/LDSETAL/LDSETL for ARM, quoting:
I think the "atomically" may be referring to the entire operation. To be honest the mnemonic doesn't make it sound like this is a bitwise or. Perhaps an ARM expert could chime in. |
Thanks for the link to LDSETAL. That looks like it might be good enough, although we would have to think carefully about whether the acquire-release semantics it guarantees matches the sequentially consistent semantics we want for sync/atomic when limited to the OR operation. Perhaps it does but perhaps not. |
That's correct. ARMV8.1 added instructions for atomic And (also, Min/Max, and, for some reason, Xor). LDCLR, LDCLRA, LDCLRAL, LDCLRL is the equivalent for atomic And. |
I'll note that there was also no contention in @rsc's benchmark. Typically, CAS loops collapse far worse under contention than direct atomic operations. |
This proposal has been added to the active column of the proposals project |
This might be a stretch, but another option could be to rewrite the simplest form of the CAS loop to the corresponding atomic operation on platforms that support it, similar to how "intrinsics" for encoding/binary work. That way old code using the thing that works now benefits as well. New API would still be convenient, but I think atomic flag sets that want it are rare, and if it's intrinsified then it can live in any package. I am given to understand this would be harder than existing rewrite rules, though, because it spans more than a single basic block. It may also be hard to formulate proofs that the rewrite preserves semantics. |
@randall77 actually prototyped exactly this. I'm not necessarily opposed to having these rewrite rules, but I don't think they're a substitute for just giving people the API that they need. Often, when writing low-level code like this, the developer wants to say what they mean and have a good sense of what's actually going on, so having to express this operation through a rather opaque rewrite rule seems too subtle. |
Based on the discussion above, this proposal seems like a likely accept. |
No change in consensus, so accepted. 🎉 |
What new API additions were proposed here? |
I guess we all assumed we knew what the API was but forgot to write it down. Oops! |
Anyone currently working on this one? I've made some progress already and would like to finish the work. If that is ok please feel free to assign the issue to me. Thanks. |
Brief update: with Go1.22 entering a freeze, I've adjusted the milestone to Go1.23. Given our time constraints, it seems adequate to postpone since so far we have only internal bits merged. |
Change https://go.dev/cl/544455 mentions this issue: |
These primitives will be used by the new And/Or sync/atomic apis. For #61395 Change-Id: I64b2e599e4f91412e0342aa01f5fd53271e9a333 GitHub-Last-Rev: 9755db5 GitHub-Pull-Request: #63314 Reviewed-on: https://go-review.googlesource.com/c/go/+/531895 Reviewed-by: abner chenc <chenguoqi@loongson.cn> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
These primitives will be used by the new And/Or sync/atomic apis. For golang#61395 Change-Id: I64b2e599e4f91412e0342aa01f5fd53271e9a333 GitHub-Last-Rev: 9755db5 GitHub-Pull-Request: golang#63314 Reviewed-on: https://go-review.googlesource.com/c/go/+/531895 Reviewed-by: abner chenc <chenguoqi@loongson.cn> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>
These primitives will be used by the new And/Or sync/atomic apis. For #61395 Change-Id: Ia9b4877048002d3d7d1dffa2311d0ec5f38e4ee5 GitHub-Last-Rev: 20dea11 GitHub-Pull-Request: #63318 Reviewed-on: https://go-review.googlesource.com/c/go/+/531678 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
Change https://go.dev/cl/584715 mentions this issue: |
These primitives will be used by the new And/Or sync/atomic apis. Implemented for mips/mipsle and mips64/mips64le. For #61395 Change-Id: Icc604a2b5cdfe72646d47d3c6a0bb49a0fd0d353 GitHub-Last-Rev: 95dca2a GitHub-Pull-Request: #63297 Reviewed-on: https://go-review.googlesource.com/c/go/+/531835 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change https://go.dev/cl/586515 mentions this issue: |
Change https://go.dev/cl/586516 mentions this issue: |
Thanks, @aclements! Unfortunatelly I lack enough experience with the compiler to implement these, so I’m happy to leave that to the experts. |
CL 544455, which added atomic And/Or APIs, raced with CL 585556, which enabled stricter linkname checking. This caused linkname-related failures on ARM and MIPS. Fix this by adding the necessary linknames. We fix one other linkname that got overlooked in CL 585556. Updates #61395. Change-Id: I454f0767ce28188e550a61bc39b7e398239bc10e Reviewed-on: https://go-review.googlesource.com/c/go/+/586516 Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Austin Clements <austin@google.com>
These primitives will be used by the new And/Or sync/atomic apis. Implemented for mips/mipsle and mips64/mips64le. For golang#61395 Change-Id: Icc604a2b5cdfe72646d47d3c6a0bb49a0fd0d353 GitHub-Last-Rev: 95dca2a GitHub-Pull-Request: golang#63297 Reviewed-on: https://go-review.googlesource.com/c/go/+/531835 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The atomic And/Or operators were added by the CL 528797, the compiler does not intrinsify them, this CL does it for arm64. Also, for the existing atomicAnd/Or operations, the updated value are not used, but at that time we need a register to temporarily hold it. Now that we have v.RegTmp, the new value is not needed anymore. This CL changes it. The other change is that the existing operations don't use their result, but now we need the old value and not the new value for the result. And this CL alias all of the And/Or operations into sync/atomic package. Peformance on an ARMv8.1 machine: old.txt new.txt sec/op sec/op vs base And32-160 8.716n ± 0% 4.771n ± 1% -45.26% (p=0.000 n=10) And32Parallel-160 30.58n ± 2% 26.45n ± 4% -13.49% (p=0.000 n=10) And64-160 8.750n ± 1% 4.754n ± 0% -45.67% (p=0.000 n=10) And64Parallel-160 29.40n ± 3% 25.55n ± 5% -13.11% (p=0.000 n=10) Or32-160 8.847n ± 1% 4.754±1% -46.26% (p=0.000 n=10) Or32Parallel-160 30.75n ± 3% 26.10n ± 4% -15.14% (p=0.000 n=10) Or64-160 8.825n ± 1% 4.766n ± 0% -46.00% (p=0.000 n=10) Or64Parallel-160 30.52n ± 5% 25.89n ± 6% -15.17% (p=0.000 n=10) For #61395 Change-Id: Ib1d1ac83f7f67dcf67f74d003fadb0f80932b826 Reviewed-on: https://go-review.googlesource.com/c/go/+/584715 Auto-Submit: Austin Clements <austin@google.com> TryBot-Bypass: Austin Clements <austin@google.com> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Fannie Zhang <Fannie.Zhang@arm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
OR/AND write protection without comparison and calculation is pointless and has long been removed in my code
|
Change https://go.dev/cl/593855 mentions this issue: |
A few of the new Or methods of the atomic types use "new" as the name for the result value, but it actually returns the old value. Fix this by renaming the result values to "old". Updates #61395. Change-Id: Ib08db9964f5dfe91929f216d50ff0c9cc891ee49 Reviewed-on: https://go-review.googlesource.com/c/go/+/593855 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Austin Clements <austin@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Thanks for the great work @mauri870. We'll have to wait a bit before starting to use this new API because intrinsifying is critical for us to yield a performance improvement. On my workstation I've measured a 1.5x slowdown by replacing: for {
old := atomic.LoadUint32(part)
if atomic.CompareAndSwapUint32(part, old, old|(1<<(num%32))) {
return
}
} With: atomic.OrUint32(part, 1<<(num%32)) The former inlines into the function, the latter doesn't: ...
callq 0x47f1e0 <sync/atomic.OrUint32.abi0>
...
<sync/atomic.OrUint32.abi0>:
; sync/atomic.OrUint32.abi0():
; ./sync/atomic/asm.s:106
jmp 0x47f380 <internal/runtime/atomic.Or32.abi0>
<internal/runtime/atomic.Or32.abi0>:
; ./internal/runtime/atomic/atomic_amd64.s:229
movq 0x8(%rsp), %rbx
movl 0x10(%rsp), %ecx
movl %ecx, %edx
movl (%rbx), %eax
orl %eax, %edx
lock
cmpxchgl %edx, (%rbx)
jne 0x47f389 <internal/runtime/atomic.Or32.abi0+0x9>
movl %eax, 0x18(%rsp)
retq It's great to see how only the last mile remains to be done. Hopefully someone can pick it up. |
Change https://go.dev/cl/594976 mentions this issue: |
Change https://go.dev/cl/594738 mentions this issue: |
A few of the new Or methods of the atomic types use "new" as the name for the result value, but it actually returns the old value. Fix this by renaming the result values to "old". Updates golang#61395. Change-Id: Ib08db9964f5dfe91929f216d50ff0c9cc891ee49 Reviewed-on: https://go-review.googlesource.com/c/go/+/593855 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Austin Clements <austin@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
For atomic AND and OR operations on memory, we currently have two views of the op. One just does the operation on the memory and returns just a memory. The other does the operation on the memory and returns the old value (before having the logical operation done to it) and memory. Update #61395 These two type differently, and there's currently some confusion in our rules about which is which. Use different names for the two different flavors so we don't get them confused. Change-Id: I07b4542db672b2cee98169ac42b67db73c482093 Reviewed-on: https://go-review.googlesource.com/c/go/+/594976 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Keith Randall <khr@google.com>
Update #61395 Change-Id: I59a950f48efc587dfdffce00e2f4f3ab99d8df00 Reviewed-on: https://go-review.googlesource.com/c/go/+/594738 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Nicolas Hillegeer <aktau@google.com>
Intrinsic And/Or are in for arm64 and amd64. (arm64 was released with 1.23. amd64 will be released with 1.24.) |
Update, Aug 16 2023: Current proposal at #61395 (comment).
Original related proposal: #31748
Use case: we have types with methods that set a value. These methods manipulate a bitset indicating that the value was set, which is used (e.g.) for data serialization. Users of this API know to use a lock to manage concurrently reading/writing the same field, but they are allowed to concurrently write to different fields. Given that the bitsets are logically shared between different fields, we must manipulate them atomically. Currently that takes the form of a CAS loop:
But, on x86-64, there are atomic versions of AND/OR that do this in one go, as mentioned in #31748. Using this would not only make the setters faster, it would likely also allow inlining them:
setPresent
is too complex to inline.cc @aclements @ianlancetaylor @randall77
The text was updated successfully, but these errors were encountered: