Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mean of an array with missing values does not work if the dims argument is provided #7

Open
oxinabox opened this issue Mar 19, 2019 · 6 comments · May be fixed by #172
Open

Mean of an array with missing values does not work if the dims argument is provided #7

oxinabox opened this issue Mar 19, 2019 · 6 comments · May be fixed by #172

Comments

@oxinabox
Copy link

julia> using Statistics

julia> x = [(i==3 && j==3) ? missing : i*j for i in 1:3, j in 1:4]
3×4 Array{Union{Missing, Int64},2}:
 1  2  3          4
 2  4  6          8
 3  6   missing  12

julia> mean(x)
missing

julia> mean(x; dims=1)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
    # SNIP 
julia> mean(x; dims=2)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
    # SNIP

Expected is:

julia> mean(x; dims=1)
1×4 Array{Union{Missing,Float64},2}:
 2.0  4.0  missing  8.0

julia> mean(x; dims=2)
3×1 Array{Union{Missing,Float64},2}:
 2.5
 5.0
 missing

(credit to @nickrobinson251 who found this)

@oxinabox oxinabox changed the title missing mean with dims Mean of an array with missing values does not work if the dims argument is provided Mar 19, 2019
@ararslan
Copy link
Member

It appears to be because of this: https://github.com/JuliaLang/julia/blob/master/stdlib/Statistics/src/Statistics.jl#L134

julia> x = [(i == j == 3) ? missing : i * j for i in 1:3, j in 1:4]
3×4 Array{Union{Missing, Int64},2}:
 1  2  3          4
 2  4  6          8
 3  6   missing  12

julia> Base.reducedim_init(t->t/2, +, x, 1)
1×4 Array{Float64,2}:
 0.0  0.0  0.0  0.0

That gets passed to mean!, which tries to store the result in that array, but it can't because of the missing.

@ararslan
Copy link
Member

Going further, we get that result from reducedim_init because of this line: https://github.com/JuliaLang/julia/blob/master/base/reducedim.jl#L117, since zero(Union{Missing,Int}) === 0.

@ararslan
Copy link
Member

sum handles this just fine, so I think we should change mean to go though more of the sum machinery.

@fredrikekre fredrikekre transferred this issue from JuliaLang/julia Nov 29, 2019
@Moelf
Copy link

Moelf commented Jul 17, 2020

julia> using Statistics

julia> a = [1 2;
            missing 3]
2×2 Array{Union{Missing, Int64},2}:
 1         2
  missing  3

julia> sum(skipmissing(a), dims=1)
ERROR: MethodError: no method matching sum(::Base.SkipMissing{Array{Union{Missing, Int64},2}}; dims=1)
Closest candidates are:
  sum(::Any) at reduce.jl:503 got unsupported keyword argument "dims"
  sum(::Any, ::AbstractArray; dims) at reducedim.jl:653
  sum(::Any, ::Any) at reduce.jl:486 got unsupported keyword argument "dims"
  ...
Stacktrace:
 [1] top-level scope at REPL[35]:1

sum also doesn't work now?

@nalimilan
Copy link
Member

@Moelf sum(skipmissing(...), dims=...) has never worked. JuliaLang/julia#28027 implements it but it's not been merged (yet?).

This failure is also interesting:

julia> mean([missing 1; 3 4]; dims=1)
ERROR: InexactError: Int64(2.5)

It happens because we use the type of first(x)/1 to choose the eltype of the result, which in this case is Missing.

sum handles this just fine, so I think we should change mean to go though more of the sum machinery.

Actually, no, sum only handles this correctly in cases where Base.add_sum returns the same type as its inputs. But it fails when the type differs (which is precisely what mean does for integer inputs):

julia> sum([missing true; false false], dims=1)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Int64

The only way to fix this is to redesign the reducedim code, as _reducedim_init is just a hack when the type isn't concrete . AFAICT the only 100% correct solution is to do as with mapand broadcast: allocate an array using the type of the first element, and widen its eltype if necessary.

@jo-fleck
Copy link

Any news/updates/new ideas on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants