Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcasting with missings is slow #30455

Closed
cstjean opened this issue Dec 19, 2018 · 12 comments
Closed

Broadcasting with missings is slow #30455

cstjean opened this issue Dec 19, 2018 · 12 comments
Labels
broadcast Applying a function over a collection missing data Base.missing and related functionality performance Must go faster

Comments

@cstjean
Copy link
Contributor

cstjean commented Dec 19, 2018

This might just be #28126, but it looks like a separate problem. On 1.0.2:

julia> using BenchmarkTools

julia> v = fill(2.0, 50000);

julia> v2 = vcat(v, missing);

julia> v3 = vcat(missing, v);

julia> @btime $v .* 3.0;
  33.105 μs (2 allocations: 390.70 KiB)

julia> @btime $v2 .* 3.0;
  1.610 ms (99498 allocations: 2.33 MiB)

julia> @btime $v3 .* 3.0;
  3.867 ms (7 allocations: 439.77 KiB)

Apart from the 40X performance difference, two things stand out:

  • 99498 allocations for v2. Why?
  • Why is v3 slower than v2, in spite of allocating much less?
@ararslan ararslan added performance Must go faster broadcast Applying a function over a collection missing data Base.missing and related functionality labels Dec 19, 2018
@nalimilan
Copy link
Member

Probably a duplicate of #28382. v3 is slower because the type instability appears with the first element, which slows down processing of all later elements.

@cstjean
Copy link
Contributor Author

cstjean commented Dec 20, 2018

v3 is slower because the type instability appears with the first element, which slows down processing of all later elements.

After encountering v3[2] and widening the output's eltype to Union{Missing, Float64}, couldn't there be a function barrier so that processing the remaining elements is fast?

@nalimilan
Copy link
Member

There will always be some performance impact, but yes, Matt provided a possible fix for this at #28382.

@cstjean
Copy link
Contributor Author

cstjean commented Dec 20, 2018

Isn't that patch just to improve the output type, after the computations have been done? I don't think it will improve speed of the broadcasting itself.

If the compiler already knows the output's eltype, then broadcast! is really fast:

julia> out = Vector{Union{Missing, Float64}}(undef, length(v3));

julia> @btime broadcast!(*, $out, $v3, 3.0);
  68.585 μs (0 allocations: 0 bytes)

In broadcast(*, v3, 3.0), after processing v3[1] and v3[2], the widened type is equal to the inferred type (Union{Missing, Float64}), so further widening is impossible. The rest of the computation "should" take 68 μs as above. Right now, it calls copyto_nonleaf! recursively, which does introduce a function barrier, but it still performs redundant type checks.

@nalimilan
Copy link
Member

Right, that's a different issue. Actually, the reason why v2 allocates a lot is that it starts with a Vector{Foat64}, and when it encounters the last element (missing), it copies everything to a Vector{Union{Missing,Float64}}. Since that's the worst case, that's not necessarily a major issue: in practice the first missing should appear much earlier. Though it would be nice to avoid copying, see #26681.

What's more problematic is that v2 is slow, to the point that copying the whole vector (as happens for v3) is still faster than working with Union{Missing,Float64} from the beginning. I've tracked this down to the use of typeof(val) <: T instead of val isa T in copyto_nonleaf!. See #30480.

@cstjean
Copy link
Contributor Author

cstjean commented Dec 21, 2018

Fantastic! Thank you so much, this will make a huge difference for us.

Vector{Foat64}, and when it encounters the last element (missing), it copies everything to a Vector{Union{Missing,Float64}}. Since that's the worst case, that's not necessarily a major issue: in practice the first missing should appear much earlier.

We occasionally have large vectors with very few missing values, so it's a concern. It seems that the loop that copies the elements to the new vector is not type-stable. Pulling it into its own function takes out all the spurious allocations:

julia> @eval Base.Broadcast function copyupto!(newdest, dest, iter, count)
           for II in Iterators.take(iter, count)
               newdest[II] = dest[II]
           end
       end
copyupto! (generic function with 1 method)

julia> @eval Base.Broadcast @inline function copyto_nonleaf!(dest, bc::Broadcasted, iter, state, count)
           T = eltype(dest)
           while true
               y = iterate(iter, state)
               y === nothing && break
               I, state = y
               @inbounds val = bc[I]
               S = typeof(val)
               if S <: T
                   @inbounds dest[I] = val
               else
                   # This element type doesn't fit in dest. Allocate a new dest with wider eltype,
                   # copy over old values, and continue
                   newdest = Base.similar(dest, promote_typejoin(T, S))
                   copyupto!(newdest, dest, iter, count)
                   newdest[I] = val
                   return copyto_nonleaf!(newdest, bc, iter, state, count+1)
               end
               count += 1
           end
           return dest
       end
copyto_nonleaf! (generic function with 1 method)

julia> @btime $v2 .* 3.0;
  484.567 μs (9 allocations: 830.47 KiB)

@cstjean
Copy link
Contributor Author

cstjean commented Dec 21, 2018

It's type-unstable because S = typeof(val) is correctly inferred as ::Union{Type{Missing}, Type{Float64}}. I suppose that the compiler could theoretically infer that if S <: T is false, then S cannot be Type{Missing} in the else branch, but that's a tall order. The function barrier looks like a better solution.

@nalimilan
Copy link
Member

Good point! Can you make another PR once #30480 is merged? #30076 might be relevant: maybe we could merge all similar copying function in a single one (maybe not)?

You could even try to make this a bit faster by passing S = typeof(val) to copyupto! and doing v = dest[II]; @assert !isa(v, S). Though maybe it wouldn't make any difference.

@cstjean
Copy link
Contributor Author

cstjean commented Dec 21, 2018

dest already has a concrete eltype in copyupto (either Missing or Float64), so we wouldn't gain anything from asserting that it's not <:S. Or do you mean something else?

@nalimilan
Copy link
Member

nalimilan commented Dec 21, 2018

Ah, good point, carry on. Then the only improvement we can make would be to avoid copying completely (#26681).

@affans
Copy link
Contributor

affans commented Dec 25, 2018

@nalimilan Unrelated, but if I wanted to understand the source of broadcasted code, how do I track it down? What does v2 .* 3.0 actually turn into in terms of the functions called?

@nalimilan
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection missing data Base.missing and related functionality performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants