-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero weights are not equivalent to omitting a value with wtd.quantile() #81
Comments
It makes sense to me that those with zero or negative weight should be removed from the dataset before doing anything. Let me know if you see a problem with that strategy, otherwise I'll make the change for the next release. |
I don't see any problem with removing them beforehand. But I'm surprised the algorithm isn't equivalent to removing them explicitly, since it's equivalent to repeating each value a number of times corresponding to its weight (for non-zero integer weights at least). |
BTW, this sounds surprising to me: > wtd.quantile(c(7, 1, 2, 4, 10, 15), c(1, 1/3, 1/3, 1/3, 1, 1))
0% 25% 50% 75% 100%
4.00 6.25 8.50 11.25 15.00
> wtd.quantile(c(7, 1, 2, 4, 10, 15), c(1, 1/3, 1/3, 1/3, 1, 1), normwt=T)
0% 25% 50% 75% 100%
2.00 7.00 8.50 13.75 15.00 Shouldn't the 0th quantile always be equal to the minimum? |
Good question. Sorry I don't have time to diagnosis this. Hope you can. |
FWIW we're working on an implementation for Julia at JuliaStats/StatsBase.jl#316, so you could take inspiration from there (though I'm still not sure what's the best approach but @matthieugomez knows better). |
For the upcoming release on CRAN I am excluding zero weighted observations up front in |
Thanks! I confirm that the examples above work fine now. |
I'm not sure this is actually a bug, but I would naively have expected that frequency weights passed to
wtd.quantile
were equivalent to repeating a value the corresponding number of times. As shown below, this assumption works for non-zero weights, but not for zero weights:This also happens with other values of the
type
argument.Is there a particular reason for this? Would you have a reference about the implemented algorithm?
The text was updated successfully, but these errors were encountered: