Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero weights are not equivalent to omitting a value with wtd.quantile() #81

Closed
nalimilan opened this issue Nov 7, 2017 · 7 comments
Closed

Comments

@nalimilan
Copy link

I'm not sure this is actually a bug, but I would naively have expected that frequency weights passed to wtd.quantile were equivalent to repeating a value the corresponding number of times. As shown below, this assumption works for non-zero weights, but not for zero weights:

> quantile(c(2, 3, 3, 4))
  0%  25%  50%  75% 100% 
2.00 2.75 3.00 3.25 4.00 
> wtd.quantile(c(2,3,3,4), c(1,1,1,1), seq(0, 1, by=.25))
  0%  25%  50%  75% 100% 
2.00 2.75 3.00 3.25 4.00 
> wtd.quantile(c(2,3,4), c(1,2,1), seq(0, 1, by=.25))
  0%  25%  50%  75% 100% 
2.00 2.75 3.00 3.25 4.00 
# Gives a different result
> wtd.quantile(c(1,2,3,4,5), c(0,1,2,1,0), seq(0, 1, by=.25))
   0%   25%   50%   75%  100% 
2.000 2.750 3.000 3.375 4.500 

This also happens with other values of the type argument.

Is there a particular reason for this? Would you have a reference about the implemented algorithm?

@harrelfe
Copy link
Owner

harrelfe commented Nov 7, 2017

It makes sense to me that those with zero or negative weight should be removed from the dataset before doing anything. Let me know if you see a problem with that strategy, otherwise I'll make the change for the next release.

@nalimilan
Copy link
Author

I don't see any problem with removing them beforehand. But I'm surprised the algorithm isn't equivalent to removing them explicitly, since it's equivalent to repeating each value a number of times corresponding to its weight (for non-zero integer weights at least).

@nalimilan
Copy link
Author

BTW, this sounds surprising to me:

> wtd.quantile(c(7, 1, 2, 4, 10, 15), c(1, 1/3, 1/3, 1/3, 1, 1))
   0%   25%   50%   75%  100% 
 4.00  6.25  8.50 11.25 15.00 
> wtd.quantile(c(7, 1, 2, 4, 10, 15), c(1, 1/3, 1/3, 1/3, 1, 1), normwt=T)
   0%   25%   50%   75%  100% 
 2.00  7.00  8.50 13.75 15.00 

Shouldn't the 0th quantile always be equal to the minimum?

@harrelfe
Copy link
Owner

Good question. Sorry I don't have time to diagnosis this. Hope you can.

@nalimilan
Copy link
Author

FWIW we're working on an implementation for Julia at JuliaStats/StatsBase.jl#316, so you could take inspiration from there (though I'm still not sure what's the best approach but @matthieugomez knows better).

@harrelfe
Copy link
Owner

For the upcoming release on CRAN I am excluding zero weighted observations up front in wtd.quantile.

@nalimilan
Copy link
Author

Thanks! I confirm that the examples above work fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants