Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handling of an empty iterator for mean and var #29033

Merged
merged 5 commits into from
Oct 11, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions stdlib/Statistics/src/Statistics.jl
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ julia> mean([√1, √2, √3])
function mean(f::Base.Callable, itr)
y = iterate(itr)
if y === nothing
throw(ArgumentError("mean of empty collection undefined: $(repr(itr))"))
return Base.mapreduce_empty_iter(f, Base.add_sum, itr,
Base.IteratorEltype(itr)) / 0
end
count = 1
value, state = y
Expand Down Expand Up @@ -148,7 +149,8 @@ var(iterable; corrected::Bool=true, mean=nothing) = _var(iterable, corrected, me
function _var(iterable, corrected::Bool, mean)
y = iterate(iterable)
if y === nothing
throw(ArgumentError("variance of empty collection undefined: $(repr(iterable))"))
T = eltype(iterable)
return typeof((abs2(zero(T)) + abs2(zero(T)))/2)(NaN)
Copy link
Member

@nalimilan nalimilan Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this throws an error when calling zero if eltype returns Any. Maybe better call throw a more explicit error similar to the existing one?

Could use oftype.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave it as is, because this is how var for AbstractArray is implemented and we want to be non-breaking (so it should have 100% the same behavior - the same outputs and the same errors - so using array or iterator is transparent).

For sure in Julia 2.0 approach we could consider cleaning it up in both places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean that instead of letting zero throw the error, it would be clearer to check T === Any and throw a more explicit error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that T does not have to be Any. It can be an arbitrary type that is or is not a subtype of Number. And upfront we do not know if this type:

  • defines zero
  • has abs2 defined
  • implements a constructor from NaN

Any of those may or may not be true so the only way to be consistent is to repeat the same code that is used for a version of var defined for arrays (otherwise I will be able to create a situation where array and iterator var differs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. We could still print a nice error in the most common case, but then we'd lose consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the point for now. As discussed in Julia 2.0 we can probably clean it up in both methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, would use oftype as @nalimilan suggested for clarity

end
count = 1
value, state = y
Expand Down
33 changes: 28 additions & 5 deletions stdlib/Statistics/test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ end
end

@testset "mean" begin
@test_throws ArgumentError mean(())
@test_throws MethodError mean(())
@test mean((1,2,3)) === 2.
@test mean([0]) === 0.
@test mean([1.]) === 1.
Expand All @@ -86,6 +86,19 @@ end
@test ismissing(mean([missing, NaN]))
@test isequal(mean([missing 1.0; 2.0 3.0], dims=1), [missing 2.0])
@test mean(skipmissing([1, missing, 2])) === 1.5
@test isequal(mean(Complex{Float64}[]), NaN+NaN*im)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also make sense to test that an empty Int input gives a Float64 output, and that an error is thrown when the element type isn't known (e.g. using a generator).

BTW, you could probably use ===.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is in muddy waters :) You cannot use === as we have different NaN in Julia and here it would fail. Notice that:

julia> NaN === -NaN
false

and this is exactly the case here, that is why I had to use isequal.

I will add the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be missing something: you really expect NaN::Float64 here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The situation is more complex (and that is why I have used this as a test):

julia> x = mean(Complex{Float64}[])
NaN + NaN*im

julia> bitstring(x.re)
"1111111111111000000000000000000000000000000000000000000000000000"

julia> bitstring(x.im)
"1111111111111000000000000000000000000000000000000000000000000000"

julia> bitstring(NaN) # see that the bitstring is different
"0111111111111000000000000000000000000000000000000000000000000000"

also you have:

julia> x = mean(Int[])
NaN

julia> x === NaN
false

For the same reason that we have different NaNs in Julia (and as you see they actually happen in practice).

@test isequal(mean(skipmissing(Complex{Float64}[])), NaN+NaN*im)
@test mean(Complex{Float64}[]) isa Complex{Float64}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also test the value? Same in other places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is tested above in line 89 (unless you meant something else).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this one is, I originally spotted mean(Int[]). It would probably be clearer to have them side by side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All cleaned up now (hopefully 😄, as you have a keener eye for details 👍)

@test mean(skipmissing(Complex{Float64}[])) isa Complex{Float64}
@test isequal(mean(abs, Complex{Float64}[]), NaN)
@test isequal(mean(abs, skipmissing(Complex{Float64}[])), NaN)
@test mean(abs, Complex{Float64}[]) isa Float64
@test mean(abs, skipmissing(Complex{Float64}[])) isa Float64
@test_throws MethodError mean([])
@test_throws MethodError mean(skipmissing([]))
@test_throws MethodError mean((1 for i in 2:1))
@test mean(Int[]) isa Float64
@test mean(skipmissing(Int[])) isa Float64

# Check that small types are accumulated using wider type
for T in (Int8, UInt8)
Expand All @@ -109,10 +122,10 @@ end
@testset "var & std" begin
# edge case: empty vector
# iterable; this has to throw for type stability
@test_throws ArgumentError var(())
@test_throws ArgumentError var((); corrected=false)
@test_throws ArgumentError var((); mean=2)
@test_throws ArgumentError var((); mean=2, corrected=false)
@test_throws MethodError var(())
@test_throws MethodError var((); corrected=false)
@test_throws MethodError var((); mean=2)
@test_throws MethodError var((); mean=2, corrected=false)
# reduction
@test isnan(var(Int[]))
@test isnan(var(Int[]; corrected=false))
Expand Down Expand Up @@ -245,6 +258,16 @@ end
@test ismissing(f([missing, NaN], missing))
@test f(skipmissing([1, missing, 2]), 0) === f([1, 2], 0)
end

@test isequal(var(Complex{Float64}[]), NaN)
@test isequal(var(skipmissing(Complex{Float64}[])), NaN)
@test var(Complex{Float64}[]) isa Float64
@test var(skipmissing(Complex{Float64}[])) isa Float64
@test_throws MethodError var([])
@test_throws MethodError var(skipmissing([]))
@test_throws ArgumentError var((1 for i in 2:1))
@test var(Int[]) isa Float64
@test var(skipmissing(Int[])) isa Float64
end

function safe_cov(x, y, zm::Bool, cr::Bool)
Expand Down