RFC: Fix #31392, unaliasing of broadcast arguments against destinations with repeated indices #31407

mbauman · 2019-03-19T17:06:20Z

When introducing the new broadcasting implementation (which came alongside greater
semantic guarantees with regards to aliasing source and destination), I found that
we needed a performance optimization for the common case where the destination was
=== to an argument. This is quite common due to the .op= syntax and indeed is
safe in most cases as the iteration order between the destination and the source
is matched. This performance optimization is not safe, however, in the case that
the destination has multiple locations that refer to the same memory.

This just ensures our unaliasing pass is indeed doing what we want it to. You can still broadcast into these sorts of "self-aliasing" arrays — which will indeed repeatedly assign into the same location — but regardless of how you reference those destinations on the RHS (be it implicitly via .op=, with or without @views, explicitly on the RHS, etc), this will now ensures the RHS arguments are unaliased from the destination before performing the operation. I believe the following example from #31392 is a strong argument in favor of this fix:

Previously:

julia> x = [3]
       v = view(x, [1,1])
       x[[1,1]] .+= v # Both RHS arguments are copies; `v` is unaliased, the implicit `x[[1,1]]` makes a copy
       x
1-element Array{Int64,1}:
 6

julia> x = [3]
       v = view(x, [1,1])
       @views x[[1,1]] .+= v # The implicit `@view x[[1,1]]` is _not_ unaliased
       x
1-element Array{Int64,1}:
 9

julia> x = [3]
       I = [1,1]
       v = view(x, I)
       x[I] .+= v # The implicit `x[I]` is a copy, `v` is _not_ unaliased as it is `=== @view x[I]`
       x
1-element Array{Int64,1}:
 9

julia> x = [3]
       I = [1,1]
       v = view(x, I)
       @views x[I] .+= v # Neither RHS argument is a copy
       x
1-element Array{Int64,1}:
 12

Clearly, this is unsustainable. With this PR the answer is 6 in all cases. I believe this is the correct answer because it matches how things behave with other views:

julia> x = [3,3]
       v = view(x, [1,1])
       x .+= v # v is unaliased so this is `x[1] = x[1] + 3; x[2] = x[2] + 3`
       x
2-element Array{Int64,1}:
 6
 6 # And not 9!

Marking this as both a bugfix and a minor change for triage as apparently some packages may have been depending upon this in certain circumstances.

JeffBezanson · 2019-03-19T17:16:07Z

Nice. I assume if we merge this, we probably don't need/want #31391, since this PR more accurately detects when a copy is needed.

mbauman · 2019-03-19T17:19:04Z

Yes, that's exactly right (edit: with the caveat that other broadcast implementations in packages need to be similarly careful about aliasing behaviors, but they need to be careful about aliasing in any case).

mbauman · 2019-03-28T18:52:11Z

Triage is in favor here given the following observations:

This behavior matches our other unaliasing behaviors
Hopefully the compiler can work harder in the future to avoid making unnecessary copies
This decision is one that lends itself towards a slightly friendlier semantic towards parallelism. There are many other concerns there, but it's good to keep our eyes on that goal.

mbauman · 2019-03-28T18:53:01Z

Let's see if I missed any obvious cases where a copy is now made:

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2019-03-29T01:34:33Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

JeffBezanson · 2019-05-21T15:37:28Z

Is this ready to merge as-is? If so, we might want to consider squeezing it into 1.2.

mbauman · 2019-05-23T18:34:34Z

I'm not terribly thrilled with this because it is making lots of unnecessary preventative copies for custom abstract array types. Perhaps it should only make the preventative copy for the known array types that can self-alias (e.g., just SubArray and a few wrappers thereof?).

…th repeated indices When introducing the new broadcasting implementation (which came alongside greater semantic guarantees with regards to aliasing source and destination), I found that we needed a performance optimization for the common case where the destination was `===` to an argument. This is quite common due to the `.op=` syntax and indeed is safe in most cases as the iteration order between the destination and the source is matched. This performance optimization is not safe, however, in the case that the destination has multiple locations that refer to the same memory.

but still provide the hook for arrays to improve

mbauman · 2020-04-27T22:24:55Z

Ok, we've been letting perfect be the enemy of the good here for far too long. Let's be less conservative — and default to not unaliasing self broadcasts — but still allow SubArrays and other custom array types to catch and fix this situation.

I think this is a good compromise for now.

mbauman · 2020-04-27T22:29:26Z

base/broadcast.jl

 """
 potentially_self_aliased(::DenseArray) = false
 potentially_self_aliased(A::StridedArray) = any(==(0), strides(A))
-potentially_self_aliased(A::SubArray) = any(!allunique, A.indices)
+potentially_self_aliased(A::SubArray) = any(map(!allunique, A.indices))


Just a note for posterity: any(map(!allunique, A.indices)) does better compile-time evaluation with heterogeneous tuples and constant-folds to return 0 more frequently.

vtjnash · 2020-10-29T18:25:46Z

GTG then?

jeremiedb · 2021-03-08T01:23:58Z

Sorry to bump, but were there reasons preventing merging a fix for this?
AFAIK, this PR may fix a silent flawed gradients retuned by zygote on a common operation (embedding).

StefanKarpinski · 2021-05-07T21:17:02Z

bumpity

StefanKarpinski · 2021-05-07T21:17:34Z

Marking for triage so discuss if there's actually anything preventing this from being merged.

mbauman · 2021-05-13T13:47:13Z

base/broadcast.jl

 broadcast_unalias(::Nothing, src) = src

+"""
+    potentially_self_aliased(A)


~~A year~~ Two years later, I'm not a big fan of this name. I'm guessing I used potentially as an attempt to say that trues are conclusive but falses are not... but I don't think it actually reads that way.

mightalias documents itself as a "conservative" test for sharing the same memory, but I don't even know what that means on its face. Is it more likely to have false positive or a false negative?

In both cases, I used a different word than my usual go-to "maybe_foo" as an attempt to do some work in one direction or the other, but I don't think it was successful because I'm having a hard time piecing together what they mean right now. In short, here's what we have between these two systems:

function returns wrong if...

mightalias true the arrays reference distinct subsets of the same memory region in a complicated manner

mightalias false a custom array has not tracked its internal field(s) with dataids and one of those aliases

potentially_self_aliased true an array goofed its implementation (the default definitions should not have false positives)

potentially_self_aliased false this is the default answer; a self-aliasing array didn't define a method on this obscure function

Numpy calls this may_have_internal_overlap with tri-state output (known to not overlap, known to overlap, and unknown).

bkamins · 2021-05-13T14:04:31Z

@mbauman - I think given this change we should add potentially_self_aliased to the Julia manual section on adding broadcasting to custom types (as user defined types might want to decide if they true or false should be returned by this function).

N5N3 · 2021-11-21T09:13:42Z

Something might unrelated:
#43153 founds that x .= f.(x) can't be vectorlized if ndims(x) > 1. (Looks like LLVM failed to prove that LHS and RHS share the same access pattern).
A easy fix is to add ivdep after @simd, but we need to ensure that dest array is not self aliased.
Looks like this PR's potentially_self_aliased could be reused there.

mbauman · 2023-08-09T21:46:28Z

base/broadcast.jl

+Returns true if multiple locations in `A` reference the same memory
+"""
+potentially_self_aliased(::DenseArray) = false
+potentially_self_aliased(A::StridedArray) = any(==(0), strides(A))


We can do better than this, but doing better is NP-hard. Ref:

https://github.com/numpy/numpy/blob/60c8b6a9f467f7809d4078aec6e4fe7cbc8191b8/numpy/core/src/common/mem_overlap.c#L841-L923

mbauman added bugfix This change fixes an existing bug broadcast Applying a function over a collection needs news A NEWS entry is required for this change triage This should be discussed on a triage call minor change Marginal behavior change acceptable for a minor release labels Mar 19, 2019

Keno removed triage This should be discussed on a triage call labels Mar 28, 2019

JeffBezanson force-pushed the mb/self-aliasing-broadcasts branch from 1ade0fb to d64f74c Compare May 18, 2019 17:53

JeffBezanson mentioned this pull request May 18, 2019

put back broadcast behavior with repeated indices, changed by #31300 #31391

Merged

tkf mentioned this pull request Mar 18, 2020

How should in-place broadcasting with repeated indices behave? #31392

Open

mbauman added 2 commits April 27, 2020 17:11

Be more optimistic about the world

55e10b2

but still provide the hook for arrays to improve

mbauman force-pushed the mb/self-aliasing-broadcasts branch from d64f74c to 55e10b2 Compare April 27, 2020 22:23

Add allunique(::Number) optimization

347cabf

mbauman commented Apr 27, 2020

View reviewed changes

mbauman added minor change Marginal behavior change acceptable for a minor release needs news A NEWS entry is required for this change and removed minor change Marginal behavior change acceptable for a minor release needs news A NEWS entry is required for this change labels Apr 29, 2020

chriselrod mentioned this pull request May 7, 2020

Failing gracefully JuliaSIMD/LoopVectorization.jl#103

Open

mcabbott mentioned this pull request Nov 23, 2020

Incorrect derivative of getindex() with repeating indices on CuArrays FluxML/Zygote.jl#821

Open

michel2323 mentioned this pull request Jan 5, 2021

Zygote for flow_constraints exanauts/ExaPF.jl#83

Merged

michel2323 mentioned this pull request Jan 13, 2021

Race condition in views with same indices on GPUs exanauts/ExaPF.jl#89

Closed

jeremiedb mentioned this pull request Mar 24, 2021

add Embedding layer FluxML/Flux.jl#1516

Merged

4 tasks

StefanKarpinski added the triage This should be discussed on a triage call label May 7, 2021

Merge branch 'master' into mb/self-aliasing-broadcasts

e4f54ca

mbauman commented May 13, 2021

View reviewed changes

jgreener64 mentioned this pull request Dec 8, 2021

getindex for CuArray to fix repeated indices error FluxML/Zygote.jl#1131

Closed

mbauman commented Aug 9, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Fix #31392, unaliasing of broadcast arguments against destinations with repeated indices #31407

RFC: Fix #31392, unaliasing of broadcast arguments against destinations with repeated indices #31407

mbauman commented Mar 19, 2019 •

edited

Loading

JeffBezanson commented Mar 19, 2019

mbauman commented Mar 19, 2019 •

edited

Loading

mbauman commented Mar 28, 2019

mbauman commented Mar 28, 2019

nanosoldier commented Mar 29, 2019

JeffBezanson commented May 21, 2019

mbauman commented May 23, 2019

mbauman commented Apr 27, 2020

mbauman Apr 27, 2020

vtjnash commented Oct 29, 2020

jeremiedb commented Mar 8, 2021

StefanKarpinski commented May 7, 2021

StefanKarpinski commented May 7, 2021

mbauman May 13, 2021 •

edited

Loading

mbauman Aug 9, 2023

bkamins commented May 13, 2021

N5N3 commented Nov 21, 2021 •

edited

Loading

mbauman Aug 9, 2023

function	returns	wrong if...
`mightalias`	`true`	the arrays reference distinct subsets of the same memory region in a complicated manner
`mightalias`	`false`	a custom array has not tracked its internal field(s) with `dataids` and one of those aliases
`potentially_self_aliased`	`true`	an array goofed its implementation (the default definitions should not have false positives)
`potentially_self_aliased`	`false`	this is the default answer; a self-aliasing array didn't define a method on this obscure function

RFC: Fix #31392, unaliasing of broadcast arguments against destinations with repeated indices #31407

Are you sure you want to change the base?

RFC: Fix #31392, unaliasing of broadcast arguments against destinations with repeated indices #31407

Conversation

mbauman commented Mar 19, 2019 • edited Loading

Previously:

JeffBezanson commented Mar 19, 2019

mbauman commented Mar 19, 2019 • edited Loading

mbauman commented Mar 28, 2019

mbauman commented Mar 28, 2019

nanosoldier commented Mar 29, 2019

JeffBezanson commented May 21, 2019

mbauman commented May 23, 2019

mbauman commented Apr 27, 2020

mbauman Apr 27, 2020

Choose a reason for hiding this comment

vtjnash commented Oct 29, 2020

jeremiedb commented Mar 8, 2021

StefanKarpinski commented May 7, 2021

StefanKarpinski commented May 7, 2021

mbauman May 13, 2021 • edited Loading

Choose a reason for hiding this comment

mbauman Aug 9, 2023

Choose a reason for hiding this comment

bkamins commented May 13, 2021

N5N3 commented Nov 21, 2021 • edited Loading

mbauman Aug 9, 2023

Choose a reason for hiding this comment

mbauman commented Mar 19, 2019 •

edited

Loading

mbauman commented Mar 19, 2019 •

edited

Loading

mbauman May 13, 2021 •

edited

Loading

N5N3 commented Nov 21, 2021 •

edited

Loading