Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation inconsistency in broadcasting #41565

Closed
aaraujo71 opened this issue Jul 13, 2021 · 3 comments · Fixed by #42497
Closed

Memory allocation inconsistency in broadcasting #41565

aaraujo71 opened this issue Jul 13, 2021 · 3 comments · Fixed by #42497
Labels
performance Must go faster

Comments

@aaraujo71
Copy link

I found the following inconsistency related to the use of the broadcasting dot macro.

When running, for example,

struct Test
    x :: Array{Float64, 1}
    y :: Array{Float64, 1}
end

function test()
    var = Test([1, 2], [1, 2])
    @time @. var.x = var.y - var.x/var.y^2
    @time @. var.x = var.y - var.x/(var.y*var.y)
    return nothing
end

test()

I get

  0.000001 seconds (2 allocations: 16 bytes)
  0.000000 seconds

This behavior occurs in versions 1.6.1 and 1.7.0-beta3 only. In version 1.5.3 I get no allocations in both calculations.

@KristofferC KristofferC added the performance Must go faster label Jul 13, 2021
@KristofferC
Copy link
Member

Something a bit more minimal:

function test(y)
    y .= 0 .- y ./ (y.^2) # extra allocation
    return y
end

function test(y, n)
    y .= 0 .- y ./ (y.^n)
    return y
end

const y = rand(1000);

@time test(y);
# 0.000002 seconds (2 allocations: 16 bytes)
@time test(y, 2);
# 0.000001 seconds

So the literal pow (leading to a Base.broadcasted(Base.literal_pow, Main.:^, x, Val(2))) perhaps makes the expression complicated enough to not inline fully.

@aaraujo71
Copy link
Author

I tried your code in version 1.5.3 and I get

  0.000001 seconds
  0.000004 seconds

No allocations :)

@ChenNingCong
Copy link

ChenNingCong commented Jul 13, 2021

After some tests, I found that it's caused by an uninlined function preprocess_args defined at:

julia/base/broadcast.jl

Lines 987 to 989 in 7d0f769

@inline preprocess_args(dest, args::Tuple) = (preprocess(dest, args[1]), preprocess_args(dest, tail(args))...)
preprocess_args(dest, args::Tuple{Any}) = (preprocess(dest, args[1]),)
preprocess_args(dest, args::Tuple{}) = ()

Redefining the function to be inlined can fix the bug:

struct Test
    x :: Array{Float64, 1}
    y :: Array{Float64, 1}
end

function test1(var)
    @. var.x = var.y - var.x/var.y^2
    return nothing
end

# compile
var = Test([1, 2], [1, 2])
println("Before fix-compiling ",@allocated test1(var))
@assert var.x == [0,1.5]

# before fix
var = Test([1, 2], [1, 2])
println("Before fix-allocated ",@allocated test1(var))
@assert var.x == [0,1.5]

import Base.Broadcast.preprocess_args
import Base.Broadcast.preprocess
@inline preprocess_args(dest, args::Tuple) = (Base.Broadcast.preprocess(dest, args[1]), Base.Broadcast.preprocess_args(dest, Base.tail(args))...)
@inline preprocess_args(dest, args::Tuple{Any}) = (Base.Broadcast.preprocess(dest, args[1]),)
@inline preprocess_args(dest, args::Tuple{}) = ()

# compile
var = Test([1, 2], [1, 2])
println("After fix-compiling ",@allocated test1(var))
@assert var.x == [0,1.5]

# after fix
var = Test([1, 2], [1, 2])
println("After fix-allocated ",@allocated test1(var))
@assert var.x == [0,1.5]

Running the code:

$ julia alloc.jl
Before fix-compiling 23433466
Before fix-allocated 16
After fix-compiling 10964697
After fix-allocated 0

The fix is extremely easy, just add @inline to those two functions...

johnomotani added a commit to mabarnes/moment_kinetics that referenced this issue Nov 29, 2021
These should be equivalent, but in julia-1.6 x.^2 allocates because of a
bug (JuliaLang/julia#41565).
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants