Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong weighted sum and mean results for complex numbers #518

Closed
aplavin opened this issue Sep 7, 2019 · 66 comments
Closed

Wrong weighted sum and mean results for complex numbers #518

aplavin opened this issue Sep 7, 2019 · 66 comments

Comments

@aplavin
Copy link
Contributor

aplavin commented Sep 7, 2019

Weighted mean gives wrong (conjugated) result for complex arrays: mean([1 + 0.5im], weights([1])) == 1.0 - 0.5im.

@nalimilan
Copy link
Member

Indeed. That's because we use dot([1 + 0.5im], [1]) to compute the weighted sum (for performance reasons). I guess we could simply do dot([1], [1 + 0.5im]) since only the first argument is conjugated, and we know weights are real.

Would you make a pull request to apply that change (in weights.jl) and add a test?

@nalimilan nalimilan changed the title Wrong weighted mean results Wrong weighted sum and mean results for complex numbers Sep 9, 2019
@oschulz
Copy link
Contributor

oschulz commented Nov 6, 2019

Just ran into this in another context: #534 isn't quite the same, because it's about values that are vectors of real numbers, but the cause is the (recursive) behavior of dot(...) as well.

I think using dot in sum(A, w) is wrong, semantically. Could we change the definition of sum(A, w) to something like sum(A .* values(w)) or mapreduce(*, +, A, values(w))? I can do a PR, if there are no objections.

@nalimilan
Copy link
Member

The reason we use dot AFAICT is that it's much more efficient thanks to BLAS. It's not an issue to abuse it if we are sure it's correct in all cases in which we use it. What we can do is to have a method using it for arrays of numbers, and a fall back method using a slower but generally correct approach.

@oschulz
Copy link
Contributor

oschulz commented Nov 6, 2019

Sounds good. Regarding on where do draw the line - BLAS would only accelerate Real, right (resp. do the wrong thing for Complex for this use case)?

So we'd do

Base.sum(v::AbstractArray{<:Real}, w::AbstractWeights) = dot(v, values(w))

Base.sum(v::AbstractArray, w::AbstractWeights) = ...

right? @nalimilan, what do you think would be best for the generic implementation:

# Option 1:
sum(A .* values(w))

# Option 2:
sum(Base.Broadcast.Broadcasted(*, (A, values(w))))

# Option 3:
mapreduce(*, +, A, values(w))

@oschulz
Copy link
Contributor

oschulz commented Nov 6, 2019

I've been thinking - maybe we should drop use of dot altogether, though, even if it means a performance loss, since we'd gain precision. AFAIK, Julia's sum() uses pairwise summation, to keep sums stable when summing over values with large differences in magnitude. But I don't think BLAS does fancy summing:

julia> A = Float32.(vcat(1.0, fill(1E-10, 10^6)))
julia> w = Weights(fill(Float32(1.0), length(A)))

julia> sum(A)
1.0000999f0

julia> sum(A, w), dot(A, values(w))
(1.0000969f0, 1.0000969f0)

julia> mapreduce(*, +, A, values(w))
1.0000999f0

Though I just saw that we're using a simple sum in StatsBase._moment2 instead of Welford's algorithm or similar - is that intentional? Statistics._varm uses centralize_sumabs2 for stability.

@nalimilan
Copy link
Member

Options 2 and 3 are the best ones. Not sure how to choose between them, I guess benchmarking is needed.

mapreduce(*, +, A, values(w)) doesn't use pairwise summation either, so I think using BLAS is better. Can you check whether sum(Base.Broadcast.Broadcasted(*, (A, values(w)))) use pairwise summation?

IIRC we use Welford's algorithm when possible in var, but not always. Please file a separate issue if that's not the case.

@oschulz
Copy link
Contributor

oschulz commented Nov 6, 2019

mapreduce(*, +, A, values(w)) doesn't use pairwise summation either [...]

Yes, it does (AFAIK the default Base.sum is actually defined via `mapreduce, which operates pairwise). Here's a more extreme example with a significant differenct in result:

julia> A = vcat([Float32(1E0)], fill(Float32(1E-8), 10^8));

julia> w = Weights(fill(Float32(1.0), length(A)));

julia> sum(A)
1.9999989f0

julia> mapreduce(*, +, A, w)
1.9999989f0

julia> sum(A, w)
1.9384409f0

julia> dot(A, values(w))
1.9384409f0

IIRC we use Welford's algorithm when possible in var, but not always. Please file a separate issue if that's not the case.

Oh, sure - I just stumbled over it while looking into sum and wasn't sure if it was intentional.

@nalimilan
Copy link
Member

Actually mapreduce(*, +, A, w) allocates a copy to store the result, so that's not an option.

@oschulz
Copy link
Contributor

oschulz commented Nov 6, 2019

Actually mapreduce(*, +, A, w) allocates a copy to store the result, so that's not an option.

Oops, indeed - I naively thought it would be allocation-free. In that case, mapreduce is out, of course.

@oschulz
Copy link
Contributor

oschulz commented Nov 6, 2019

sum(Base.Broadcast.Broadcasted(*, (A, values(w)))) seems to be allocation-free, but it's fairly slow.

@oschulz
Copy link
Contributor

oschulz commented Nov 6, 2019

Ok, so maybe we do

Base.sum(v::AbstractArray{<:Real}, w::AbstractWeights) = dot(v, values(w))

Base.sum(v::AbstractArray, w::AbstractWeights) = sum(Base.Broadcast.Broadcasted(*, (A, values(w))))

for now, to get correct behavior in the general case without performance loss in the specific case (real numbers), and worry about precision improvement in a separate issue? Would that be fine with you, @nalimilan?

@nalimilan
Copy link
Member

As I noted in my first comment, I think this will work:

Base.sum(v::AbstractArray{<:Complex}, w::AbstractWeights) = dot(w, v)

For the general case, it looks like broadcasting is slower than the dot fallback:

julia> x = rand(10_000);

julia> z = Vector{Union{Float64, Missing}}(x);

julia> @btime dot(x, z);
  13.443 μs (1 allocation: 16 bytes)

julia> f(A, w) = sum(Base.Broadcast.Broadcasted(*, (A, w)))
f (generic function with 1 method)

julia> @btime f(x, z);
  22.258 μs (11 allocations: 240 bytes)

Since we don't actually need the broadcasting features, I guess we should simply use a loop similar to dot, but without calling dot recursively:

function dot(x::AbstractArray, y::AbstractArray)
    lx = length(x)
    if lx != length(y)
        throw(DimensionMismatch("first array has length $(lx) which does not match the length of the second, $(length(y))."))
    end
    if lx == 0
        return dot(zero(eltype(x)), zero(eltype(y)))
    end
    s = zero(dot(first(x), first(y)))
    for (Ix, Iy) in zip(eachindex(x), eachindex(y))
        @inbounds s += dot(x[Ix], y[Iy])
    end
    s
end

@oschulz
Copy link
Contributor

oschulz commented Nov 7, 2019

I guess we should simply use a loop similar to dot, but without calling dot recursively

You mean something like this?

function dotnr_loop(x::AbstractArray, y::AbstractArray)
    lx = length(x)
    if lx != length(y)
        throw(DimensionMismatch("first array has length $(lx) which does not match the length of the second, $(length(y))."))
    end
    if lx == 0
        return zero(eltype(x)) * zero(eltype(y))
    end
    s = zero(first(x) * first(y))
    for (Ix, Iy) in zip(eachindex(x), eachindex(y))
        @inbounds s += x[Ix] * y[Iy]
    end
    s
end

For the general case, it looks like broadcasting is slower than the dot fallback

Sure, where dot can be used, we should use dot. But for the generic use case, it's not so clear to me that a hard-coded loop is superior to a broadcast-based implementation.

With my use case from #534 (computing the weighted center of a cluster of 3D-points), dotnr_loop is indeed somewhat faster than sum + broadcast (equivalent to reduce(+, ... + broadcast), but not dramatically:

julia> using StatsBase, Statistics, LinearAlgebra, StaticArrays, BenchmarkTools

julia> x = rand(SVector{3,Float64}, 10^3);
julia> y = rand(Float64, length(x));

julia> @benchmark dotnr_loop($x, $y)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.189 μs (0.00% GC)
  median time:      1.190 μs (0.00% GC)
  mean time:        1.203 μs (0.00% GC)
  maximum time:     2.948 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

julia> @benchmark sum(Base.Broadcast.Broadcasted(*, ($x, values($y))))
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  2
  --------------
  minimum time:     1.510 μs (0.00% GC)
  median time:      1.512 μs (0.00% GC)
  mean time:        1.518 μs (0.00% GC)
  maximum time:     3.178 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

julia> @benchmark reduce(+, Base.Broadcast.Broadcasted(*, ($x, values($y))))
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  2
  --------------
  minimum time:     1.511 μs (0.00% GC)
  median time:      1.519 μs (0.00% GC)
  mean time:        1.524 μs (0.00% GC)
  maximum time:     2.611 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

On the other hand, reduce + broadcast is simpler, more generic and will also run on a GPU (currently needs explicit init element):

dotnr_bcast(x::AbstractArray, y::AbstractArray) = reduce(
    +,
    Base.Broadcast.Broadcasted(*, (x, values(y))),
    init = zero(eltype(x)) * zero(eltype(y))
)

So when the data becomes larger

julia> using CuArrays
julia> x = rand(SVector{3,Float64}, 10^6);
julia> y = rand(Float64, length(x));
julia> cx = CuArray(x);
julia> cy = CuArray(y);
julia> CuArrays.allowscalar(false)


julia> @btime dotnr_loop($x, $y)
  1.813 ms (0 allocations: 0 bytes)
3-element SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
 249588.56051677305
 250059.7208080707 
 249841.12107313547

julia> @btime dotnr_bcast($x, $y)
  2.492 ms (2 allocations: 48 bytes)
3-element SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
 249588.56051677305
 250059.7208080707 
 249841.12107313547

julia> @btime dotnr_bcast($cx, $cy)
  63.029 μs (91 allocations: 5.14 KiB)
3-element SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
 249588.56051677145
 250059.72080806308
 249841.12107313433

then dotnr_loop with the explicit loop is still a bit faster than dotnr_bcast - but on the other hand, dotnr_bcast can run on a GPU and blow dotnr_loop out of the water.

So I wonder if we shouldn't go with the broadcast-based implementation, even if it's a bit slower on a CPU (at least for this use case it is). For eltype(x) <: Number we'd keep using dot of course, for full BLAS performance.

@oschulz
Copy link
Contributor

oschulz commented Nov 8, 2019

Long term, we should find an alternative to dot for the "simple numbers" case as well, though, and use a more stable summing algorithm (KBN or pairwise).

@nalimilan
Copy link
Member

You mean something like this?

Yes.

On the other hand, reduce + broadcast is simpler, more generic and will also run on a GPU (currently needs explicit init element):
...
then dotnr_loop with the explicit loop is still a bit faster than dotnr_bcast - but on the other hand, dotnr_bcast can run on a GPU and blow dotnr_loop out of the water.

Tough call. I guess we should try to figure out with broadcast is slower than a loop on CPU. A similar alternative is using a generator, but in my tests it's also slower than a loop. Can you investigate this? This is of interest for Julia in general, not only for weighted sum, especially if a custom syntax is added for lazy broadcasting (JuliaLang/julia#31088).

Long term, we should find an alternative to dot for the "simple numbers" case as well, though, and use a more stable summing algorithm (KBN or pairwise).

Actually sum on Broadcasted objects could easily use the same pairwise algorithm as for arrays. AFAICT that's what JuliaLang/julia#31020 implements. It would be interesting to check what's the performance on that branch, if you feel like benchmarking it.

@oschulz
Copy link
Contributor

oschulz commented Nov 8, 2019

Can you investigate this? This is of interest for Julia in general, not only for weighted sum, especially if a custom syntax is added for lazy broadcasting

Well, I can try - but this may go fairly deep, we might need some code-dev experts there. :-) I assume some of the core (and not so core) developers are anyhow frequently thinking about how to improve broadcasting & friends. :-)

Actually sum on Broadcasted objects could easily use the same pairwise algorithm as for arrays.

I thought it did ... but apparently not (yet):

julia> A = vcat([Float32(1E0)], fill(Float32(1E-8), 10^8));

julia> w = Weights(fill(Float32(1.0), length(A)));

julia> reduce(+, broadcast(*, A, values(w)))
1.9999989f0

julia> reduce(+, Base.Broadcast.Broadcasted(*, (A, values(w))))
1.0f0

Oops. But as you say, maybe that's coming soon?

@nalimilan
Copy link
Member

Oops. But as you say, maybe that's coming soon?

At least it's implemented in JuliaLang/julia#31020, so it's quite close.

@nalimilan
Copy link
Member

OK, I've tested that branch, and summing Broadcasted is still slower. It seems that it's due to the fact that IndexStyle(::Broadcasted) = IndexCartesian(). Indeed, with a hack to force them to use IndexLinear, the speed is the same as sum, and reasonably close to dot:

julia> using BenchmarkTools

julia> x = rand(100_000);

julia> y = rand(100_000);

# This isn't correct in general, but OK for this simple case
julia> Base.LinearIndices(bc::Base.Broadcast.Broadcasted{<:Any}) = axes(bc)[1]

julia> @btime sum(x);
  16.212 μs (1 allocation: 16 bytes)

julia> @btime Base._mapreduce(identity, Base.add_sum, IndexLinear(), Base.Broadcast.Broadcasted(identity, (x,)));
  16.942 μs (3 allocations: 48 bytes)

julia> @btime Base._mapreduce(identity, Base.add_sum, IndexLinear(), Base.Broadcast.Broadcasted(*, (x, y)));
  32.016 μs (3 allocations: 64 bytes)

julia> using LinearAlgebra

julia> @btime dot(x, y);
  17.722 μs (1 allocation: 16 bytes)

Unfortunately this hack cannot be applied as-is, but it seems worth investigating whether we can ensure that Broadcasted(*, (x, y)) can use IndexLinear when both inputs are also IndexLinear. Then, combined with #31020, we could stop using dot.

@oschulz
Copy link
Contributor

oschulz commented Nov 8, 2019

The advantage of using broadcast + reduce, compared to a custom implementation, would be that we'll take advantage of future improvements automatically. :-)

@oschulz
Copy link
Contributor

oschulz commented Nov 8, 2019

Indeed, with a hack to force them to use IndexLinear, the speed is the same as sum, and reasonably close to dot

Nice!!

@oschulz
Copy link
Contributor

oschulz commented Nov 8, 2019

IndexStyle(::Broadcasted) = IndexCartesian()

I looked for that definition, but I can't find it - is it something implicit?

I tried to run your example, but I get

julia> @btime Base._mapreduce(identity, Base.add_sum, IndexLinear(), Base.Broadcast.Broadcasted(identity, (x,)));
ERROR: MethodError: no method matching _mapreduce(::typeof(identity), ::typeof(Base.add_sum), ::IndexLinear, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(identity),Tuple{Array{Float64,1}}})

@oschulz
Copy link
Contributor

oschulz commented Nov 8, 2019

Ah, let me guess, I need JuliaLang/julia#31020, of course?

@oschulz
Copy link
Contributor

oschulz commented Nov 8, 2019

Hm, we can't just override Base.LinearIndices(bc::Base.Broadcast.Broadcasted{<:Any}) though, can we? ;-) And using an internal function like _mapreduce should be avoided if possible, I guess.

But maybe we could use

Base.Broadcast.Broadcasted(*, (A, w))

instead of

Base.Broadcast.Broadcasted(*, (A, values(w)))

and then add a method to IndexStyle for the specific signature of Broadcasted with Weights in it (since StatsBase owns Weights)?

Or maybe it would be possible to do this in a generic way, so that IndexStyle(::Broadcasted{...}) where [...} = IndexLinear() in more cases (e.g. all args are AbstractVectors)?

@nalimilan
Copy link
Member

Actually after reading more carefully https://github.com/JuliaLang/julia/pull/31020/files#r256266019 I realized it already works, one just needs to call instantiate:

julia> @btime sum(Base.Broadcast.instantiate(Base.Broadcast.Broadcasted(*, (x, y))));
  31.200 μs (4 allocations: 96 bytes)

So we just need to wait for the PR to be merged in Julia (hopefully in time for 1.4), and we will be able to stop using dot. I guess if you want you can prepare a PR with a VERSION check, that will be needed anyway to avoid performance regression on older Julia versions.

@oschulz
Copy link
Contributor

oschulz commented Nov 9, 2019

Sounds great! Regarding the PR, sure - I guess I'll wait until #31020 is merged though, so that CI tests can run.

@oschulz
Copy link
Contributor

oschulz commented Dec 20, 2021

I've been reminding myself about that issue myself recently, in the context of JuliaArrays/ArraysOfArrays.jl#20. Maybe an unoptimized fallback plus a super-lightweight package AbstractArraysOfArrays.jl or so that provides only interfaces like flatview, with ArraysOfArrays providing actually implementation of such arrays (and likely Base too, in the future, with JuliaLang/Pkg.jl#1285)?

If a dependency on a lightweight nested-arrays-interface package would be acceptable, I'd volunteer to take this on, including the standard non-optimized case and this issue here.

@devmotion
Copy link
Member

For Distributions, in principle a definition for AbstractVector{<:AbstractVector{<:Real}} would be sufficient. This would be even simpler and not require any additional dependencies.

Of course, one should add optimizations for EachCol and EachRow when the PR to base is merged but it is not a necessary requirement - it would also work without such optimizations, just less performant probably.

@oschulz
Copy link
Contributor

oschulz commented Dec 20, 2021

I think we really would need an interface that gives access to the backing flat matrix (if there is one), something more specific than parent. And even after JuliaLang/julia#32310, it would be nice to have this as a separate interface package, I think to allow generic code that can take advantage of flat-backed nested arrays without depending on Julia v1.8.

@oschulz
Copy link
Contributor

oschulz commented Dec 20, 2021

How about I'll make a prototype (https://github.com/JuliaArrays/AbstractArraysOfArrays.jl) and we see what it can yield, potentially? AbstractArraysOfArrays would of course add support for SliceArray after JuliaLang/julia#32310.

@devmotion
Copy link
Member

I think we really would need an interface that gives access to the backing flat matrix (if there is one), something more specific than parent

I'm not sure if an interface is needed - it's definitely not needed for supporting vectors of vectors (whereas the generic definition is). I view it merely as an optimization for custom vector types, and hence similar to other optimizations it might be fine to implement them together with the custom types in separate packages. There are also not too many of such custom types and eg ColVecs/RowVecs is used mostly in the Julia GP organization, and at least ColVecs/RowVecs I expect to be replaced with EachCol/EachRow.

without depending on Julia v1.8.

There's already a PR to Compat that makes it available on older Julia versions.

@devmotion
Copy link
Member

So put shortly: I think one should rather prioritize and add a generic implementation instead of working on an interface and (possibly) adding dependencies. Adding dependencies also seems problematic (impossible?) if methods with weights are moved to Statistics (IIRC this was the plan - even though in general I'm not convinced that moving things from StatsBase to a stdlib is a good idea, in particular for anything that is not absolutely stable since development will become much slower and more challenging).

@oschulz
Copy link
Contributor

oschulz commented Dec 20, 2021

I'm not sure if an interface is needed - it's definitely not needed for supporting vectors of vectors (whereas the generic definition is).

Indeed. But I think and interface is required to enable optimized (and GPU-friendly) methods. I don't believe support for efficient nested arrays should be limited to the upcoming SliceArray - I like it a lot, and I'll use it a lot, but it like it'll be more centered on being a "view" - it won't support operations like vcat, append! that some applications (e.g. concatenating samples) need. On the other hand, it allows slicing in a much more flexible fashion than ArraysOfArrays does. I think even long-term we'll need more than one implementation of nested arrays that are backed by contiguous storage.

There's already a PR to Compat that makes it available on older Julia versions.

Oh, indeed - nice!

@oschulz
Copy link
Contributor

oschulz commented Dec 20, 2021

So put shortly: I think one should rather prioritize and add a generic implementation

Sure, that can be done first, without depending on anything. I'd just like to know if there will be room to support other efficient nested array implementations besides Base.SliceArray in principle, provided that there would be only one central, well defined and extremely lightweight dependency.

Adding dependencies also seems problematic (impossible?) if methods with weights are moved to Statistics

Ah, that could indeed be a problem.

@oschulz
Copy link
Contributor

oschulz commented Dec 20, 2021

If JuliaLang/julia#32310 would add an AbstractSlices, that would solve everything neatly, but so far no one has responded to that suggestion.

@devmotion
Copy link
Member

devmotion commented Dec 20, 2021

I think even long-term we'll need more than one implementation of nested arrays that are backed by contiguous storage.

I agree, I assume there will always be different implementations that probably focus on different use cases and allow slightly different optimizations. I wonder, however, if StatsBase or Statistics should be concerned with it at all - or if it should be the responsibility of the packages with custom vectors to provide optimized implementations. These could be optimized for their specific vector design or eg make use of some external interface package (that StatsBase doesn't have to know about) or eg make use of an implementation for EachCol/EachRow in StatsBase.

@oschulz
Copy link
Contributor

oschulz commented Dec 20, 2021

I agree - I assume there will always be different implementations that probably focus on different use cases and allow slightly different optimizations. I wonder, however, if StatsBase or Statistics should be concerned with it at all - or if it should be the responsibility of the packages with custom vectors to provide optimized implementations.

I think that would make those packages immense and result in a lot of code duplication. It's not just StatsBase, it would also be functions for individual distributions (e.g. an efficient and GPU-friendly likelihood(::MvNormal, ...)) that I don't think should be implemented in array packages (and would make them very unattractive due to huge load times too).

What I'm going for here is a few simple functions like flatview(myarray_of_arrays) that give access to underlying flat arrays where they exist. So that code in StatsBase and Distributions can take advantage of efficiently stored nested arrays without heavy dependencies and without code specific to nested-array-implementations.

The upcoming Compat for EachCol and EachRow will also not cover cases like multiple samples from matrix-variate distributions and so on.

If methods with weights are moved to Statistics one day (possibly), that could be accompanied by adding such an API to Base - but for now, that seems to be in the (possibly far) future.

@devmotion
Copy link
Member

I think that would make those packages immense and result in a lot of code duplication.

Code could be shared even if it is not part of StatsBase, eg by an external interface package with default definitions or by using code for EachRow etc in StatsBase.

What I'm going for here is a few simple functions like flatview(myarray_of_arrays) that give access to underlying flat arrays where they exist. 

Usually, you also have to know the orientation of the vectors (eg rows or columns), often just retrieving the underlying array is not enough. Due to these differences I still wonder if/what an interface could provide that works for and helps with generic implementations and is not covered by the AbstractVector interface.

The upcoming Compat for EachCol and EachRow will also not cover cases like multiple samples from matrix-variate distributions and so on.

We use Distributions.eachvariate that falls back to Distributions.EachVariate (which could be replaced with Slices).

@devmotion
Copy link
Member

If JuliaLang/julia#32310 would add an AbstractSlices, that would solve everything neatly, but so far no one has responded to that suggestion.

This might be the cleanest solution - if there exists a common API for such slices that is helpful for working with slices and is not just the one for AbstractVector.

@oschulz
Copy link
Contributor

oschulz commented Dec 21, 2021

there exists a common API for such slices

JuliaLang/julia#32310 may not provide everything we need there, judging from the current state and what Compat will provide will neseccarily be limited as well. I'll try to draft something in AbstractArraysOfArrays that can fully integrate with JuliaLang/julia#32310 and allow for other implementations and backward-compat as well.

Usually, you also have to know the orientation of the vectors (eg rows or columns)

Absolutely, I agree that will be crucial.

@nalimilan
Copy link
Member

What's the current status of weighted sum, mean etc. for vectors of vectors? Even just an unoptimized fallback definition would be very useful (or rather needed) for JuliaStats/Distributions.jl#1424. Special AbstractVector{<:AbstractVector} implementations such as in in ArraysOfArrays or KernelFunctions (for ColVecs and RowVecs) could still make use of an underlying matrix structure for performance improvements, and the same could be done for EachRow and EachCol in StatsBase if JuliaLang/julia#32310 is merged.

I admit I don't really use reductions over vectors, but don't they kind of work already? BTW, the present issue (about complex numbers) has been fixed recently by #737, right?

Regarding optimized implementations for specific types, I'd rather avoid adding yet another dependency to StatsBase, so any solution to leverage JuliaLang/julia#32310 sounds appealing!

@oschulz
Copy link
Contributor

oschulz commented Dec 21, 2021

so any solution to leverage JuliaLang/julia#32310 sounds appealing!

For sure - the problem ist that they way it's currently written it doesn't really provide a generic API for efficient nested arrays, flattening them, reconstructing nested arrays from similar flattened arrays, and so on ...

@devmotion
Copy link
Member

I admit I don't really use reductions over vectors, but don't they kind of work already?

Unfortunately, they don't work:

julia> using StatsBase

julia> mean([[0.0, 1.0, 0.0], [0.3, 0.2, 0.5], [0.4, 0.2, 0.4]], weights([0.2, 0.3, 0.5]))
ERROR: DimensionMismatch("x and y are of different lengths!")
Stacktrace:
  [1] dot
    @ ~/.asdf/installs/julia/1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/generic.jl:900 [inlined]
  [2] dot(x::Vector{Vector{Float64}}, y::Weights{Float64, Float64, Vector{Float64}})
    @ LinearAlgebra ~/.asdf/installs/julia/1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/generic.jl:915
  [3] wsum
    @ ~/.julia/packages/StatsBase/CuCbN/src/weights.jl:383 [inlined]
  [4] wsum
    @ ~/.julia/packages/StatsBase/CuCbN/src/weights.jl:385 [inlined]
  [5] #sum#13
    @ ~/.julia/packages/StatsBase/CuCbN/src/weights.jl:607 [inlined]
  [6] sum
    @ ~/.julia/packages/StatsBase/CuCbN/src/weights.jl:607 [inlined]
  [7] _mean
    @ ~/.julia/packages/StatsBase/CuCbN/src/weights.jl:656 [inlined]
  [8] #mean#16
    @ ~/.julia/packages/StatsBase/CuCbN/src/weights.jl:654 [inlined]
  [9] mean(A::Vector{Vector{Float64}}, w::Weights{Float64, Float64, Vector{Float64}})
    @ StatsBase ~/.julia/packages/StatsBase/CuCbN/src/weights.jl:654
 [10] top-level scope
    @ REPL[4]:1

julia> StatsBase.var([[0.0, 1.0, 0.0], [0.3, 0.2, 0.5], [0.4, 0.2, 0.4]], weights([0.2, 0.3, 0.5]))
ERROR: MethodError: no method matching var(::Vector{Vector{Float64}}, ::Weights{Float64, Float64, Vector{Float64}})
Closest candidates are:
  var(::AbstractArray{T} where T<:Real, ::AbstractWeights; mean, corrected) at ~/.julia/packages/StatsBase/CuCbN/src/moments.jl:41
  var(::AbstractArray{T} where T<:Real, ::AbstractWeights, ::Int64; mean, corrected) at ~/.julia/packages/StatsBase/CuCbN/src/moments.jl:92
  var(::AbstractArray; corrected, mean, dims) at ~/.asdf/installs/julia/1.7.0/share/julia/stdlib/v1.7/Statistics/src/Statistics.jl:368
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[6]:1

julia> StatsBase.cov([[0.0, 1.0, 0.0], [0.3, 0.2, 0.5], [0.4, 0.2, 0.4]], weights([0.2, 0.3, 0.5]))
3-element Vector{Float64}:
  0.028333333333333335
 -0.05333333333333333
  0.024999999999999994

@oschulz
Copy link
Contributor

oschulz commented Dec 21, 2021

I admit I don't really use reductions over vectors, but don't they kind of work already? BTW, the present issue (about complex numbers) has been fixed recently by #737, right?

I just tested it - it's not fully fixed yet, I think:

julia> A = [rand(3,4) for i in 1:6]
julia> w = FrequencyWeights([rand() for i in 1:6])
julia> mean(A, w)
ERROR: DimensionMismatch("x and y are of different lengths!")

@oschulz
Copy link
Contributor

oschulz commented Dec 21, 2021

@devmotion was faster, hadn't updated the page yet :-)

@nalimilan
Copy link
Member

Please try mean on latest master:

julia> mean([[0.0, 1.0, 0.0], [0.3, 0.2, 0.5], [0.4, 0.2, 0.4]], weights([0.2, 0.3, 0.5]))
3-element Vector{Float64}:
 0.29000000000000004
 0.36
 0.35

var and cov are another matter though, but better file a separate issue.

For sure - the problem ist that they way it's currently written it doesn't really provide a generic API for efficient nested arrays, flattening them, reconstructing nested arrays from similar flattened arrays, and so on ...

OK. I must say I don't really see what the implementation would look like. Ideally a high-level API would work efficiently for all array types -- otherwise there's no end to adding special methods for each array type.

@devmotion
Copy link
Member

Please try mean on latest master:

Oh nice, I didn't realize that there were unreleased changes in the master branch. Can we make a new release?

@devmotion
Copy link
Member

devmotion commented Dec 21, 2021

Wait,

wsum(v::AbstractArray, w::AbstractVector, dims::Colon=:) = transpose(w) * vec(v)
seems wrong - it does ignore dims completely? However, it seems this problem existed also before #737.

Edit: Never mind, it allows only Colon 🙈

@oschulz
Copy link
Contributor

oschulz commented Dec 21, 2021

OK. I must say I don't really see what the implementation would look like.

I'm currently drafting something.

otherwise there's no end to adding special methods for each array type

Yes indeed! I think nested arrays that are backed by flat arrays are an important class of array types, but one should only need to write a single specialized methods to support any of them in a generic fashion (possibly two, depending on inner- or outer-dims-first storage). We already do have these specialized methods in StatsBase, Distributions and all over the ML-ecosystem, the ones that take a flat array instead of nested arrays. We just need to redirect to them.

@oschulz
Copy link
Contributor

oschulz commented May 8, 2022

Of course, one should add optimizations for EachCol and EachRow when the PR to base is merged
plus a super-lightweight package AbstractArraysOfArrays.jl
I'm currently drafting something. [...] depending on inner- or outer-dims-first storage [...]

@devmotion I think I found a clean an lightweight approach:

AbstractArraysOfArrays (we can name it differently) can provide two main functions:

  • A_flat = flattened(AoA)
  • A_flat, f_reconstruct = flattened_and_back(AoA)
  • flattened_is_efficient(AoA) # should find a better name

To handle the inner- or outer-dims-first or mixed-dims backing-storage issue, the flatten functions would

  • Always return a flat array that has the inner dims or the array of arrays as it's first dims - not necessarily in memory, but as far as getindex is concerned.

    This seems to me to be the "natural" Julianic order - it's the memory model of eachcol, so an array of arrays whose entries are contiguous in memory, just like a standard Array{Array{...}}. It's also the mem/dim-model that Distributions (for multivariate dists), Flux and other places in the ecosystem use. It's of course not the right order for every use case, which is why cov & friends default to contigous-outer-dims.

  • To make this efficient and preserve information about the original mem/dim-model, the flatten functions will return a PermutedDimsArray if the array of arrays is backed by (e.g.) a dense array that is not in inner-dims-first order.

This way, we can easily support eachrow, eachcol, @bramtayl's JuliennedArrays.jl (allows for arbitrary dim order), ArraysOfArrays.jl (currently always uses inner-dims-first) and the future Base.Slices (JuliaLang/julia#32310, arbitrary dim order). Flattening will always be an O(1) view if possible and we won't need a separate API to query dimension-storage-order.

Packages like StatsBase, Distributions & friends would only need to depend on the super-lightweight AbstractArraysOfArrays (or other name) to be able to check if an array of arrays can be flattened efficiently and then use linear algebra operations to process a collection of variates in one go (just an example). Would be a good basis for custom broadcasts that prevent a lot of memory allocation and and enable GPU-compatibility. I think it would also help with moving (I believe this is the intention?) from a matrix to a vector-of-vectors sample model for multivariate distributions. It would also, I think, provide a good basis to make batched parameter transformations efficient (e.g. in Bijectors). And it this would work with the existing nested array packages as well as JuliaLang/julia#32310 (when it lands).

@ToucheSir
Copy link

A couple questions coming from https://github.com/TuringLang/Bijectors.jl/discussions/178:

  1. Is A_flat always an AbstractMatrix or can it be higher dimensional? e.g. a WHCN feature map for a conv layer.
  2. How might one reconstruct an AoA representation without doing the whole flatten -> f_reconstruct dance? For most ML use cases, we are already receiving a "flat" ND array as input
  3. Does/how does this address dispatching on AoA types? e.g. how does a function in NNlib know that it's working with one of the AoA types you've listed above without having to depend on each corresponding package?

@oschulz
Copy link
Contributor

oschulz commented May 8, 2022

Is A_flat always an AbstractMatrix or can it be higher dimensional?

@ToucheSir It would absolutely be higher-dimensional for vectors-of-similar-matrices or matrices-of-similar-tensors, and so on. The flattened version of an N-dimensional array of M-dimensional arrays would be an N+M-dimensional array.

How might one reconstruct an AoA representation without doing the whole flatten -> f_reconstruct dance?

ArraysOfArrays currently provides a nestedview function to make it easy to turn a flat array into nested arrays in a zero-copy fashion. But with such a function one needs to decide which array-of-arrays type should be chosen by default, so I wouldn't include it in an AbstractArraysOfArrays package currently. Once JuliaLang/julia#32310 lands there would be an obvious choice on what the default array-of-arrays type should be (though it would not be available down to Julia LTS, which might be a problem for many packages).

Does/how does this address dispatching on AoA types? e.g. how does a function in NNlib know that it's working with one of the AoA types you've listed above without having to depend on each corresponding package?

I think it needs to be done Holy-traits fashion: I don't think we'll get a common super-type for all array-of-arrays implementations. I've asked about adding such a supertype in JuliaLang/julia#32310 a while ago already, but there has been no favourable response. And it would be good to support eachcol and eachrow down to Julia v1.6, of course.

With a function like flattened_is_efficient (admittedly not the best name) that returns singletons EfficientFlatten/NoEfficientFlatten or so (EfficientFlatten could even include dimension order information), code could easily dispatch and select between optimized-uses-flatten and fallback-for-any-array-of-arrays implementations.

@blackeneth
Copy link

Is this also fixed by PR #775 ?

@oschulz
Copy link
Contributor

oschulz commented May 16, 2023

Is this also fixed by #778?

Looks like it!

@oschulz
Copy link
Contributor

oschulz commented May 16, 2023

#534 seems fixed as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants