-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unit weights #515
Add unit weights #515
Conversation
Codecov Report
@@ Coverage Diff @@
## master #515 +/- ##
==========================================
- Coverage 90.1% 88.69% -1.41%
==========================================
Files 21 21
Lines 2021 2053 +32
==========================================
Hits 1821 1821
- Misses 200 232 +32
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #515 +/- ##
==========================================
+ Coverage 90.14% 90.27% +0.13%
==========================================
Files 21 21
Lines 2029 2057 +28
==========================================
+ Hits 1829 1857 +28
Misses 200 200
Continue to review full report at Codecov.
|
src/weights.jl
Outdated
one(T) | ||
end | ||
|
||
@propagate_inbounds function Base.getindex(wv::UnitWeights{S, T}, i::AbstractArray{<:Int}) where {S, T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I realize this definition isn't the most natural: we'd better return a new weights vector of the same type. But that's a separate issue since other types do the same thing.
Can you also add tests? |
Tests added! |
Thanks. A few methods don't seem to be covered by tests: see dark red in the margin at https://codecov.io/gh/JuliaStats/StatsBase.jl/compare/416af5561d3efd11c78c82c269a9d7c7d26c681d...009b9d5b893bd9235dfd96895acfe1a3c1df81ea/diff. |
I added more tests. As it turns out, we need |
What if you also remove the |
You're right. I'd messed up my tests. I've amended the last commit to eliminate the reintroduction of |
@eval begin | ||
function Base.sum(v::$v, w::UnitWeights) | ||
if length(v) != length(w) | ||
throw(DimensionMismatch("Inconsistent array dimension.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does UnitWeight
have length?
And if it has to have length then why do we need to actually check it?
Is this a useful feature vs it just always being considered the right size?
Like LinearAlgebra.I
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting point. Here's an example. In computing the variance, we obtain the normalization factor through varcorrection
. It depends on the number of observations, so we need the length of the weight vector. We should also ensure that the user has not passed the wrong length lest we use the wrong normalization factor. There are ways around it (for instance, a method for variance
that bypasses the weights), but we want to help developers avoid defining separate methods. More generally, I think that boundary checks at different points help users catch mistakes before they snowball.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it doesn't hurt to check lengths, and you can avoid it using @inbounds
. I'd say that in general it's quite easy to get the appropriate length since it's the length of another input argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculating the length isn't free for general iterators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but in most cases we take arrays, right? Also you can always create an object of length typemax
if you don't care about the checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly isn't always. Isn't dropmissing
after all is one of the cases of iterators that are not free to take then length of?
These checks just don't feel great to me.
What error are we catching?
And how does that error relate to weighting?
If one is checking one got the amount of data that was expected, then isn't that better done with an explicit check on the data gathering code? So a good error can be thrown?
I feel like this just adds work for the user of weights, and the best case is they get a confusing error message.
I'm all for defensive programming, but that means checking the function you are defining can work. Not just checking arbitrary things in the calling code.
Also you can always create an object of length typemax if you don't care about the checks.
That doesn't work, does it? This check is checking the lengths are equal, not checking a >
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support weights with dropmissing
currently, the main reason being that in general we would need to skip weights corresponding to missing entries (instead of naively taking weights in their order of appearance).
That doesn't work, does it? This check is checking the lengths are equal, not checking a
>
.
Right, it doesn't work here (I was thinking about custom functions which want to take a UnitWeights
as default weights for simplicity of implementation). I guess we could allow something like UnitWeights(nothing)
(or maybe UnitWeights
) to create an object with a undefined length. But I still think it's nice to take a length by default, for checking, but also because that's required to be an actual AbstractVector
(otherwise we need to error when calling size
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lbittarello! Do you want to prepare another PR to apply the values
cleanup we talked about?
Happy to! It'll take me a couple of weeks though, as I'll be out of town. |
Gentle bump about |
I know I am a bit late but why not import using FillArrays
UnitWeights{T} = AbstractWeights{Int, T, Ones{T}}
uweights(T, len) = Ones{T}(len)
|
The issue with that approach is that |
Closes #135, #358 and #516