Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignoring elements with 0 weight #492

Open
tpapp opened this issue May 6, 2019 · 3 comments
Open

ignoring elements with 0 weight #492

tpapp opened this issue May 6, 2019 · 3 comments

Comments

@tpapp
Copy link
Contributor

tpapp commented May 6, 2019

Thanks to #316, quantile now ignores elements with a 0 weight. I wonder if the same would make sense for other functions, particularly mean and std. Currently

julia> using StatsBase

julia> w = Weights([0, 1, 1]);

julia> x = [Inf, 1, 1];

julia> mean(x, w)
NaN

julia> std(x, w, corrected = false)
NaN

julia> quantile(x, w, [0.5])
1-element Array{Float64,1}:
 1.0

My motivation for this is working with data where I am calculating statistics of some x ./ y, weighted by y. When y == 0, the ratio is nonsensical, but at the same time weighting should make it irrelevant. Instead, now the NaN propagates.

This is nothing I cannot work around, it would just make my workflow more convenient.

@nalimilan
Copy link
Member

That kind of makes sense, but I wonder how it would affect performance. For example, the weighted mean/sum uses BLAS over dimensions, which I guess doesn't offer that flexibility -- and even in pure Julia code we wouldn't be able to use standard floating-point arithmetic unless I'm mistaken.

@tpapp
Copy link
Contributor Author

tpapp commented May 8, 2019

I agree — we would need to branch on the weight == 0.

I would be happy to sacrifice speed for this feature, but I understand that not everyone has the same preferences.

Perhaps we could have an option selecting this behavior (zero weight elements don't participate in arithmetic), or having it enabled by default and disabling it on demand. If a PR for that could be considered, I would be happy to make one.

Another interface option I can imagine for this is a wrapper on the weights, eg

mean(x, SkipZeros(Weights(w)))

and similarly for std etc.

@nalimilan
Copy link
Member

That code is going to be moved to Julia, see JuliaLang/julia#31395. I'm not sure an option would be acceptable, since it will have to be added directly to the Base sum function. But that's actually something useful in general to skip NaN, so one could add support for weights to NaNMath.mean.

Another solution (easier in the short term) is to wrap x in a lazy array which replaces Inf with 0. See also discussion at JuliaLang/julia#4552.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants