Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problems when using broadcasting and logical operators #47493

Closed
roflmaostc opened this issue Nov 8, 2022 · 1 comment · Fixed by #43322
Closed

Performance problems when using broadcasting and logical operators #47493

roflmaostc opened this issue Nov 8, 2022 · 1 comment · Fixed by #43322
Labels
broadcast Applying a function over a collection duplicate Indicates similar issues or pull requests

Comments

@roflmaostc
Copy link
Contributor

roflmaostc commented Nov 8, 2022

Broadcasting with boolean operators seems to be very slow in some circumstances:

See below one example with Julia 1.8.2:

julia> x = randn((100,));

julia> y = randn((1,100));

julia> @time (x.^2 .< 1) .&& (1 .+ y.^2  .<1);
  0.004369 seconds (144.73 k allocations: 3.005 MiB)

julia> @btime ($x.^2 .< 1) .&& (1 .+ $y.^2  .<1);
  4.009 ms (144713 allocations: 3.00 MiB)

# this works
julia> @btime ($x.^2 .< 1) .&& ($y.^2  .<1);
  2.523 μs (3 allocations: 5.56 KiB)

julia> @btime ($x.^2 .< 1) .|| ($y.^2 .+ 1 .<1);
  2.738 μs (3 allocations: 5.56 KiB)
@roflmaostc roflmaostc changed the title Performance regression allocations when using broadcasting and logical operators Performance regression when using broadcasting and logical operators Nov 8, 2022
@roflmaostc roflmaostc changed the title Performance regression when using broadcasting and logical operators Performance problems when using broadcasting and logical operators Nov 8, 2022
@N5N3
Copy link
Member

N5N3 commented Nov 8, 2022

Our .&& and .|| is flatten based so I believe this is a duplication of #27988.
With #43322, I get

julia> @btime ($x.^2 .< 1) .&& (1 .+ $y.^2  .<1);
  10.400 μs (3 allocations: 5.56 KiB)

julia> @btime ($x.^2 .< 1) .& (1 .+ $y.^2  .<1);
  4.271 μs (3 allocations: 5.56 KiB)

julia> @btime ($x.^2 .< 1) .&& ($y.^2  .<1);
  10.000 μs (3 allocations: 5.56 KiB)

julia> @btime ($x.^2 .< 1) .& ($y.^2  .<1);
  4.229 μs (3 allocations: 5.56 KiB)

So perhaps .& is always a better choice here.

@N5N3 N5N3 added broadcast Applying a function over a collection duplicate Indicates similar issues or pull requests labels Nov 8, 2022
github-merge-queue bot pushed a commit that referenced this issue Jul 15, 2023
…d and inlined) (#43322)

A follow up attemp to fix #27988. (close #47493 close #50554)
Examples:
```julia
julia> using LazyArrays
julia> bc = @~ @. 1*(1 + 1) + 1*1;
julia> bc2 = @~ 1 .* 1 .- 1 .* 1 .^2 .+ 1 .* 1 .+ 1 .^ 3;
```
On master:
<details><summary> click for details </summary>
<p>

```julia
julia> @code_typed Broadcast.flatten(bc).f(1,1,1,1,1)
CodeInfo(
1 ─ %1  = Core.getfield(args, 1)::Int64
│   %2  = Core.getfield(args, 2)::Int64
│   %3  = Core.getfield(args, 3)::Int64
│   %4  = Core.getfield(args, 4)::Int64
│   %5  = Core.getfield(args, 5)::Int64
│   %6  = invoke Base.Broadcast.var"#13#14"{Base.Broadcast.var"#16#18"{Base.Broadcast.var"#15#17", Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}, Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}, Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}, typeof(+)}}(Base.Broadcast.var"#16#18"{Base.Broadcast.var"#15#17", Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}, Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}, Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}, typeof(+)}(Base.Broadcast.var"#15#17"(), Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}(Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}(Base.Broadcast.var"#15#17"())), Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}(Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}(Base.Broadcast.var"#25#26"())), Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}(Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}(Base.Broadcast.var"#21#22"())), +))(%1::Int64, %2::Int64, %3::Vararg{Int64}, %4, %5)::Tuple{Int64, Int64, Vararg{Int64}}
│   %7  = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}(Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}(Base.Broadcast.var"#21#22"())), %6)::Tuple{Int64, Int64}
│   %8  = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}(Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}(Base.Broadcast.var"#25#26"())), %6)::Tuple{Vararg{Int64}}
│   %9  = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#16#18"{Base.Broadcast.var"#9#11", Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}, Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}, Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}, typeof(*)}(Base.Broadcast.var"#9#11"(), Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}(Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}(Base.Broadcast.var"#15#17"())), Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}(Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}(Base.Broadcast.var"#25#26"())), Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}(Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}(Base.Broadcast.var"#21#22"())), *), %8)::Tuple{Int64}
│   %10 = Core.getfield(%7, 1)::Int64
│   %11 = Core.getfield(%7, 2)::Int64
│   %12 = Base.mul_int(%10, %11)::Int64
│   %13 = Core.getfield(%9, 1)::Int64
│   %14 = Base.add_int(%12, %13)::Int64
└──       return %14
) => Int64

julia> @code_typed Broadcast.flatten(bc2).f(1,1,1,^,1,Val(2),1,1,^,1,Val(3))
CodeInfo(
1 ─ %1  = Core.getfield(args, 1)::Int64
│   %2  = Core.getfield(args, 2)::Int64
│   %3  = Core.getfield(args, 3)::Int64
│   %4  = Core.getfield(args, 5)::Int64
│   %5  = Core.getfield(args, 7)::Int64
│   %6  = Core.getfield(args, 8)::Int64
│   %7  = Core.getfield(args, 10)::Int64
│   %8  = invoke Base.Broadcast.var"#13#14"{Base.Broadcast.var"#16#18"{Base.Broadcast.var"#15#17", Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}}, Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}}, Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}}, typeof(Base.literal_pow)}}(Base.Broadcast.var"#16#18"{Base.Broadcast.var"#15#17", Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}}, Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}}, Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}}, typeof(Base.literal_pow)}(Base.Broadcast.var"#15#17"(), Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}}(Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}(Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}(Base.Broadcast.var"#15#17"()))), Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}}(Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}(Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}(Base.Broadcast.var"#25#26"()))), Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}}(Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}(Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}(Base.Broadcast.var"#21#22"()))), Base.literal_pow))(%3::Int64, ^::Function, %4::Vararg{Any}, $(QuoteNode(Val{2}())), %5, %6, ^, %7, $(QuoteNode(Val{3}())))::Tuple{Int64, Any, Vararg{Any}}
│   %9  = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}(Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}(Base.Broadcast.var"#21#22"())), %8)::Tuple{Int64, Any}
│   %10 = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}(Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}(Base.Broadcast.var"#25#26"())), %8)::Tuple
│   %11 = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#15#17"(), %10)::Tuple
│   %12 = Core.getfield(%9, 1)::Int64
│   %13 = Core.getfield(%9, 2)::Any
│   %14 = (*)(%12, %13)::Any
│   %15 = Core.tuple(%14)::Tuple{Any}
│   %16 = Core._apply_iterate(Base.iterate, Core.tuple, %15, %11)::Tuple{Any, Vararg{Any}}
│   %17 = Base.mul_int(%1, %2)::Int64
│   %18 = Core.tuple(%17)::Tuple{Int64}
│   %19 = Core._apply_iterate(Base.iterate, Core.tuple, %18, %16)::Tuple{Int64, Any, Vararg{Any}}
│   %20 = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}(Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}(Base.Broadcast.var"#21#22"())), %19)::Tuple{Int64, Any}
│   %21 = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}(Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}(Base.Broadcast.var"#25#26"())), %19)::Tuple
│   %22 = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#16#18"{Base.Broadcast.var"#15#17", Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}, Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}, Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}, typeof(*)}(Base.Broadcast.var"#15#17"(), Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}(Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}(Base.Broadcast.var"#15#17"())), Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}(Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}(Base.Broadcast.var"#25#26"())), Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}(Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}(Base.Broadcast.var"#21#22"())), *), %21)::Tuple{Any, Vararg{Any}}
│   %23 = Core.getfield(%20, 1)::Int64
│   %24 = Core.getfield(%20, 2)::Any
│   %25 = (-)(%23, %24)::Any
│   %26 = Core.tuple(%25)::Tuple{Any}
│   %27 = Core._apply_iterate(Base.iterate, Core.tuple, %26, %22)::Tuple{Any, Any, Vararg{Any}}
│   %28 = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}(Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}(Base.Broadcast.var"#21#22"())), %27)::Tuple{Any, Any}
│   %29 = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}(Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}(Base.Broadcast.var"#25#26"())), %27)::Tuple
│   %30 = Core._apply_iterate(Base.iterate, Base.Broadcast.var"#16#18"{Base.Broadcast.var"#9#11", Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}}, Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}}, Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}}, typeof(Base.literal_pow)}(Base.Broadcast.var"#9#11"(), Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}}(Base.Broadcast.var"#13#14"{Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}}(Base.Broadcast.var"#13#14"{Base.Broadcast.var"#15#17"}(Base.Broadcast.var"#15#17"()))), Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}}(Base.Broadcast.var"#23#24"{Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}}(Base.Broadcast.var"#23#24"{Base.Broadcast.var"#25#26"}(Base.Broadcast.var"#25#26"()))), Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}}(Base.Broadcast.var"#19#20"{Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}}(Base.Broadcast.var"#19#20"{Base.Broadcast.var"#21#22"}(Base.Broadcast.var"#21#22"()))), Base.literal_pow), %29)::Tuple{Any}
│   %31 = Core.getfield(%28, 1)::Any
│   %32 = Core.getfield(%28, 2)::Any
│   %33 = (+)(%31, %32)::Any
│   %34 = Core.getfield(%30, 1)::Any
│   %35 = (+)(%33, %34)::Any
└──       return %35
) => Any
```
</p>

</details>

On this PR
```julia
julia> @code_typed Broadcast.flatten(bc).f(1,1,1,1,1)
CodeInfo(
1 ─ %1 = Core.getfield(args, 1)::Int64
│   %2 = Core.getfield(args, 2)::Int64
│   %3 = Core.getfield(args, 3)::Int64
│   %4 = Core.getfield(args, 4)::Int64
│   %5 = Core.getfield(args, 5)::Int64
│   %6 = Base.add_int(%2, %3)::Int64
│   %7 = Base.mul_int(%1, %6)::Int64
│   %8 = Base.mul_int(%4, %5)::Int64
│   %9 = Base.add_int(%7, %8)::Int64
└──      return %9
) => Int64

julia> @code_typed Broadcast.flatten(bc2).f(1,1,1,^,1,Val(2),1,1,^,1,Val(3))
CodeInfo(
1 ─ %1  = Core.getfield(args, 1)::Int64
│   %2  = Core.getfield(args, 2)::Int64
│   %3  = Core.getfield(args, 3)::Int64
│   %4  = Core.getfield(args, 5)::Int64
│   %5  = Core.getfield(args, 7)::Int64
│   %6  = Core.getfield(args, 8)::Int64
│   %7  = Core.getfield(args, 10)::Int64
│   %8  = Base.mul_int(%1, %2)::Int64
│   %9  = Base.mul_int(%4, %4)::Int64
│   %10 = Base.mul_int(%3, %9)::Int64
│   %11 = Base.sub_int(%8, %10)::Int64
│   %12 = Base.mul_int(%5, %6)::Int64
│   %13 = Base.add_int(%11, %12)::Int64
│   %14 = Base.mul_int(%7, %7)::Int64
│   %15 = Base.mul_int(%14, %7)::Int64
│   %16 = Base.add_int(%13, %15)::Int64
└──       return %16
) => Int64
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection duplicate Indicates similar issues or pull requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants