Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync/atomic: add OR/AND operators for unsigned types #61395

Closed
aktau opened this issue Jul 17, 2023 · 68 comments
Closed

sync/atomic: add OR/AND operators for unsigned types #61395

aktau opened this issue Jul 17, 2023 · 68 comments

Comments

@aktau
Copy link
Contributor

aktau commented Jul 17, 2023

Update, Aug 16 2023: Current proposal at #61395 (comment).


Original related proposal: #31748

Use case: we have types with methods that set a value. These methods manipulate a bitset indicating that the value was set, which is used (e.g.) for data serialization. Users of this API know to use a lock to manage concurrently reading/writing the same field, but they are allowed to concurrently write to different fields. Given that the bitsets are logically shared between different fields, we must manipulate them atomically. Currently that takes the form of a CAS loop:

type MyStruct struct{
  x int32
  y int32
  // ...

  present uint32 // Contains "present" bits for 32 fields.
}

func (m *MyStruct) SetX(v int32) {
  m.x = v
  setPresent(&m.present[0], 1)
}

func (m *MyStruct) SetY(v int32) {
  m.y = v
  setPresent(&m.present[0], 2)
}

func setPresent(part *uint32, num uint32) {
	for {
		old := atomic.LoadUint32(part)
		swapped := atomic.CompareAndSwapUint32(part, old, old|(1<<(num%32)))
		if swapped {
			return
		}
		// Yield and then try the swap again.
		runtime.Gosched()
	}
}

// similar for clearPresent, but with AND.

But, on x86-64, there are atomic versions of AND/OR that do this in one go, as mentioned in #31748. Using this would not only make the setters faster, it would likely also allow inlining them: setPresent is too complex to inline.

cc @aclements @ianlancetaylor @randall77

@rsc
Copy link
Contributor

rsc commented Jul 17, 2023

But, on x86-64, there are atomic versions of AND/OR that do this in one go, as mentioned in #31748. Using this would not only make the setters faster, it would likely also allow inlining them: setPresent is too complex to inline.

setPresent should not be calling runtime.Gosched. There's nothing another goroutine is going to do to help move things along (unlike, say, waiting for a mutex to be unlocked).

If you remove the Gosched, it should be inlinable. At least, -m says this is inlinable:

func orcas(x *uint32, bit uint32) {
	for {
		old := atomic.Load(x)
		if atomic.Cas(x, old, old|bit) {
			break
		}
	}
}

I wrote this in runtime/atomic_test.go:

package runtime_test

import (
	"runtime/internal/atomic"
	"testing"
)

func BenchmarkCas(b *testing.B) {
	var x uint32
	for i := 0; i < b.N; i++ {
		for {
			old := atomic.Load(&x)
			if atomic.Cas(&x, old, old|1) {
				break
			}
		}
	}
}

func BenchmarkOr(b *testing.B) {
	var x uint32
	for i := 0; i < b.N; i++ {
		atomic.Or(&x, 1)
	}
}

The result is:

BenchmarkCas-16                  	154634635	         7.429 ns/op
BenchmarkOr-16                   	270709120	         4.462 ns/op

So this would be an x86-only optimization that wins less than 2X. I'm not sure this really makes sense given how unusual it typically is. The main argument I can see is inlinability, but it seems to be inlinable already if written efficiently.

@aktau
Copy link
Contributor Author

aktau commented Jul 17, 2023

I was wondering about runtime.Gosched() as well, I couldn't figure out why it would help. Thanks for highlighting that it is indeed unnecessary.

So this would be an x86-only optimization that wins less than 2X. I'm not sure this really makes sense given how unusual it typically is. The main argument I can see is inlinability, but it seems to be inlinable already if written efficiently.

A 2x win may still be significant, but I'll know more when I add a better benchmarking setup. But also I found LDSET/LDSETA/LDSETAL/LDSETL for ARM, quoting:

Atomic bit set on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, performs a bitwise OR with the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

I think the "atomically" may be referring to the entire operation. To be honest the mnemonic doesn't make it sound like this is a bitwise or. Perhaps an ARM expert could chime in.

@rsc
Copy link
Contributor

rsc commented Jul 17, 2023

Thanks for the link to LDSETAL. That looks like it might be good enough, although we would have to think carefully about whether the acquire-release semantics it guarantees matches the sequentially consistent semantics we want for sync/atomic when limited to the OR operation. Perhaps it does but perhaps not.

@aclements
Copy link
Member

But also I found LDSET/LDSETA/LDSETAL/LDSETL for ARM

That's correct. ARMV8.1 added instructions for atomic And (also, Min/Max, and, for some reason, Xor). LDCLR, LDCLRA, LDCLRAL, LDCLRL is the equivalent for atomic And.

@aclements
Copy link
Member

A 2x win may still be significant

I'll note that there was also no contention in @rsc's benchmark. Typically, CAS loops collapse far worse under contention than direct atomic operations.

@rsc
Copy link
Contributor

rsc commented Jul 19, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Jul 19, 2023
@zephyrtronium
Copy link
Contributor

This might be a stretch, but another option could be to rewrite the simplest form of the CAS loop to the corresponding atomic operation on platforms that support it, similar to how "intrinsics" for encoding/binary work. That way old code using the thing that works now benefits as well. New API would still be convenient, but I think atomic flag sets that want it are rare, and if it's intrinsified then it can live in any package. I am given to understand this would be harder than existing rewrite rules, though, because it spans more than a single basic block. It may also be hard to formulate proofs that the rewrite preserves semantics.

@aclements
Copy link
Member

This might be a stretch, but another option could be to rewrite the simplest form of the CAS loop to the corresponding atomic operation on platforms that support it

@randall77 actually prototyped exactly this. I'm not necessarily opposed to having these rewrite rules, but I don't think they're a substitute for just giving people the API that they need. Often, when writing low-level code like this, the developer wants to say what they mean and have a good sense of what's actually going on, so having to express this operation through a rather opaque rewrite rule seems too subtle.

@rsc
Copy link
Contributor

rsc commented Jul 26, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals Jul 26, 2023
@rsc
Copy link
Contributor

rsc commented Aug 2, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc moved this from Likely Accept to Accepted in Proposals Aug 2, 2023
@rsc rsc changed the title proposal: sync/atomic: add OR/AND operators for unsigned types sync/atomic: add OR/AND operators for unsigned types Aug 2, 2023
@rsc rsc modified the milestones: Proposal, Backlog Aug 2, 2023
@mateusz834
Copy link
Member

What new API additions were proposed here?

@rsc
Copy link
Contributor

rsc commented Aug 2, 2023

I guess we all assumed we knew what the API was but forgot to write it down. Oops!
I think the new API is take anything that says Add in the current sync/atomic package and s/Add/And/ and also s/Add/Or/.
So there would be both new top-level functions and new methods on Int32, Int64, Uint32, Uint64, and Uintptr.

@mauri870
Copy link
Member

mauri870 commented Aug 5, 2023

Anyone currently working on this one? I've made some progress already and would like to finish the work. If that is ok please feel free to assign the issue to me. Thanks.

@mauri870 mauri870 added this to the Go1.23 milestone Nov 22, 2023
@mauri870
Copy link
Member

mauri870 commented Nov 22, 2023

Brief update: with Go1.22 entering a freeze, I've adjusted the milestone to Go1.23. Given our time constraints, it seems adequate to postpone since so far we have only internal bits merged.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/544455 mentions this issue: sync/atomic: public And/Or apis

gopherbot pushed a commit that referenced this issue Feb 5, 2024
These primitives will be used by the new And/Or sync/atomic apis.

For #61395

Change-Id: I64b2e599e4f91412e0342aa01f5fd53271e9a333
GitHub-Last-Rev: 9755db5
GitHub-Pull-Request: #63314
Reviewed-on: https://go-review.googlesource.com/c/go/+/531895
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
These primitives will be used by the new And/Or sync/atomic apis.

For golang#61395

Change-Id: I64b2e599e4f91412e0342aa01f5fd53271e9a333
GitHub-Last-Rev: 9755db5
GitHub-Pull-Request: golang#63314
Reviewed-on: https://go-review.googlesource.com/c/go/+/531895
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Feb 20, 2024
These primitives will be used by the new And/Or sync/atomic apis.

For #61395

Change-Id: Ia9b4877048002d3d7d1dffa2311d0ec5f38e4ee5
GitHub-Last-Rev: 20dea11
GitHub-Pull-Request: #63318
Reviewed-on: https://go-review.googlesource.com/c/go/+/531678
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/584715 mentions this issue: cmd/compile: intrinsify atomic And/Or on arm64

gopherbot pushed a commit that referenced this issue May 11, 2024
These primitives will be used by the new And/Or sync/atomic apis.

Implemented for mips/mipsle and mips64/mips64le.

For #61395

Change-Id: Icc604a2b5cdfe72646d47d3c6a0bb49a0fd0d353
GitHub-Last-Rev: 95dca2a
GitHub-Pull-Request: #63297
Reviewed-on: https://go-review.googlesource.com/c/go/+/531835
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@aclements
Copy link
Member

@mauri870 , congrats on landing all of this for 1.23!

I know that there's some follow-up work to make these operations intrinsic across common platforms (e.g., CL 584715 does most of this work for ARM64). I just wanted to ask if you're planning on working on that.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/586515 mentions this issue: sync/atomic: revert "public And/Or ops and race instrumentation"

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/586516 mentions this issue: internal/runtime/atomic: fix missing linknames

@mauri870
Copy link
Member

@mauri870 , congrats on landing all of this for 1.23!

I know that there's some follow-up work to make these operations intrinsic across common platforms (e.g., CL 584715 does most of this work for ARM64). I just wanted to ask if you're planning on working on that.

Thanks, @aclements! Unfortunatelly I lack enough experience with the compiler to implement these, so I’m happy to leave that to the experts.

gopherbot pushed a commit that referenced this issue May 17, 2024
CL 544455, which added atomic And/Or APIs, raced with CL 585556, which
enabled stricter linkname checking. This caused linkname-related
failures on ARM and MIPS. Fix this by adding the necessary linknames.

We fix one other linkname that got overlooked in CL 585556.

Updates #61395.

Change-Id: I454f0767ce28188e550a61bc39b7e398239bc10e
Reviewed-on: https://go-review.googlesource.com/c/go/+/586516
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Austin Clements <austin@google.com>
lifubang pushed a commit to lifubang/go that referenced this issue May 20, 2024
These primitives will be used by the new And/Or sync/atomic apis.

Implemented for mips/mipsle and mips64/mips64le.

For golang#61395

Change-Id: Icc604a2b5cdfe72646d47d3c6a0bb49a0fd0d353
GitHub-Last-Rev: 95dca2a
GitHub-Pull-Request: golang#63297
Reviewed-on: https://go-review.googlesource.com/c/go/+/531835
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue May 23, 2024
The atomic And/Or operators were added by the CL 528797,
the compiler does not intrinsify them, this CL does it for
arm64.

Also, for the existing atomicAnd/Or operations, the updated
value are not used, but at that time we need a register to
temporarily hold it. Now that we have v.RegTmp, the new value
is not needed anymore. This CL changes it.

The other change is that the existing operations don't use their
result, but now we need the old value and not the new value for
the result.

And this CL alias all of the And/Or operations into sync/atomic
package.

Peformance on an ARMv8.1 machine:
                      old.txt       new.txt
                      sec/op         sec/op         vs base
And32-160            8.716n ± 0%    4.771n ± 1%  -45.26% (p=0.000 n=10)
And32Parallel-160    30.58n ± 2%   26.45n ± 4% -13.49% (p=0.000 n=10)
And64-160            8.750n ± 1%    4.754n ± 0%  -45.67% (p=0.000 n=10)
And64Parallel-160    29.40n ± 3%    25.55n ± 5%  -13.11% (p=0.000 n=10)
Or32-160             8.847n ± 1%    4.754±1%  -46.26% (p=0.000 n=10)
Or32Parallel-160     30.75n ± 3%    26.10n ± 4%  -15.14% (p=0.000 n=10)
Or64-160             8.825n ± 1%    4.766n ± 0%  -46.00% (p=0.000 n=10)
Or64Parallel-160     30.52n ± 5%    25.89n ± 6%  -15.17% (p=0.000 n=10)

For #61395

Change-Id: Ib1d1ac83f7f67dcf67f74d003fadb0f80932b826
Reviewed-on: https://go-review.googlesource.com/c/go/+/584715
Auto-Submit: Austin Clements <austin@google.com>
TryBot-Bypass: Austin Clements <austin@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Fannie Zhang <Fannie.Zhang@arm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@ruyi789
Copy link

ruyi789 commented Jun 9, 2024

OR/AND write protection without comparison and calculation is pointless and has long been removed in my code

func OrSet8u(ptr *uint8, set uint8, unset uint8) (old uint8) {
x:
	old = *ptr
	if !Cas8u(ptr, old, (old&^unset)|set) {
		goto x
	}
	return
}
func OrSet32u(ptr *uint32, set uint32, unset uint32) (old uint32) {
x:
	old = *ptr
	if !Cas32u(ptr, old, (old&^unset)|set) {
		goto x
	}
	return
}
func OrSet32u_Old(ptr *uint32, set uint32, unset uint32, old uint32) bool {
	return Cas32u(ptr, old, (old&^unset)|set)
}

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/593855 mentions this issue: sync/atomic: correct result names for Or methods

gopherbot pushed a commit that referenced this issue Jun 21, 2024
A few of the new Or methods of the atomic types use "new" as the name
for the result value, but it actually returns the old value. Fix this
by renaming the result values to "old".

Updates #61395.

Change-Id: Ib08db9964f5dfe91929f216d50ff0c9cc891ee49
Reviewed-on: https://go-review.googlesource.com/c/go/+/593855
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
@aktau
Copy link
Contributor Author

aktau commented Jun 25, 2024

Thanks for the great work @mauri870. We'll have to wait a bit before starting to use this new API because intrinsifying is critical for us to yield a performance improvement. On my workstation I've measured a 1.5x slowdown by replacing:

	for {
		old := atomic.LoadUint32(part)
		if atomic.CompareAndSwapUint32(part, old, old|(1<<(num%32))) {
			return
		}
	}

With:

atomic.OrUint32(part, 1<<(num%32))

The former inlines into the function, the latter doesn't:

...
               	callq	0x47f1e0 <sync/atomic.OrUint32.abi0>
...

<sync/atomic.OrUint32.abi0>:
; sync/atomic.OrUint32.abi0():
; ./sync/atomic/asm.s:106
                jmp	0x47f380 <internal/runtime/atomic.Or32.abi0>

<internal/runtime/atomic.Or32.abi0>:
; ./internal/runtime/atomic/atomic_amd64.s:229
                movq	0x8(%rsp), %rbx
                movl	0x10(%rsp), %ecx
                movl	%ecx, %edx
                movl	(%rbx), %eax
                orl	%eax, %edx
                lock
                cmpxchgl	%edx, (%rbx)
                jne	0x47f389 <internal/runtime/atomic.Or32.abi0+0x9>
                movl	%eax, 0x18(%rsp)
                retq

It's great to see how only the last mile remains to be done. Hopefully someone can pick it up.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/594976 mentions this issue: cmd/compile: fix typing of atomic logical operations

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/594738 mentions this issue: cmd/compile: make sync/atomic AND/OR operations intrinsic on amd64

Mchnan pushed a commit to Mchnan/go-sylixos that referenced this issue Jul 9, 2024
A few of the new Or methods of the atomic types use "new" as the name
for the result value, but it actually returns the old value. Fix this
by renaming the result values to "old".

Updates golang#61395.

Change-Id: Ib08db9964f5dfe91929f216d50ff0c9cc891ee49
Reviewed-on: https://go-review.googlesource.com/c/go/+/593855
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
gopherbot pushed a commit that referenced this issue Jul 23, 2024
For atomic AND and OR operations on memory, we currently have two
views of the op. One just does the operation on the memory and returns
just a memory. The other does the operation on the memory and returns
the old value (before having the logical operation done to it) and
memory.

Update #61395

These two type differently, and there's currently some confusion in
our rules about which is which. Use different names for the two
different flavors so we don't get them confused.

Change-Id: I07b4542db672b2cee98169ac42b67db73c482093
Reviewed-on: https://go-review.googlesource.com/c/go/+/594976
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Nicolas Hillegeer <aktau@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
gopherbot pushed a commit that referenced this issue Jul 23, 2024
Update #61395

Change-Id: I59a950f48efc587dfdffce00e2f4f3ab99d8df00
Reviewed-on: https://go-review.googlesource.com/c/go/+/594738
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Nicolas Hillegeer <aktau@google.com>
@randall77
Copy link
Contributor

Intrinsic And/Or are in for arm64 and amd64. (arm64 was released with 1.23. amd64 will be released with 1.24.)
If other maintainers want to intrinsify for other architectures, go for it. I'm happy to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

Successfully merging a pull request may close this issue.