Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect computation of quantiles of vector of infinities #19542

Closed
simonster opened this issue Dec 9, 2016 · 18 comments
Closed

Incorrect computation of quantiles of vector of infinities #19542

simonster opened this issue Dec 9, 2016 · 18 comments
Labels
good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@simonster
Copy link
Member

julia> quantile([Inf, Inf], 0.5)
NaN

I think this should be Inf, which is what median returns.

@simonster simonster added good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request labels Dec 9, 2016
@alexhallam
Copy link
Contributor

julia> quantile([Inf], 0.5)
Inf

Is this sufficient?

@johnmyleswhite
Copy link
Member

I believe that is the behavior that Simon has in mind. Do you have a modification of the code that produces that result?

@alexhallam
Copy link
Contributor

It is already doing that.

@johnmyleswhite
Copy link
Member

johnmyleswhite commented Dec 11, 2016

I see. Then, no, I think Simon wants his case and similar cases to be handled and not only the single-element of Inf case.

@alexhallam
Copy link
Contributor

Also,

julia> quantile([0,Inf], 0.5)
Inf

I am not sure what it means to find quantile([Inf, Inf], 0.5). Maybe NaN is not a bad output.

@simonster
Copy link
Member Author

I think quantile([Inf, Inf], 0.5) should be the same as middle(Inf, Inf)and median([Inf, Inf]) which are both Inf.

More generally, something like this should also return Inf and not NaN:

quantile([-0.817856,0.584222,4.40526,-3.11349,1.31774,-2.48744,Inf,Inf,2.40881,-1.25807,1.37004], 0.95)

For both of these cases, I believe the operation is mathematically well-defined, and R, NumPy, and MATLAB all return Inf.

@johnmyleswhite
Copy link
Member

Just to be sure, what you mean is that these operations are well-defined given the standard ordering on the extended real line, right?

@alexhallam
Copy link
Contributor

Wow, I did not mean to digress to measure theory. I have been eager to find a intro issue in Julia to dig my teeth into. I just wanted to make sure that this was desired functionality before I started working on it. If it is expected functionality in R, NumPy, and Matlab then I would not mind attempting it in Julia.

@johnmyleswhite
Copy link
Member

I don't think you need to deal with measure theory, just with the idea that -Inf and Inf are valid values and that they sort like -Inf < x < Inf for all finite x. Imagine a distribution that generates -Inf, 0 and Inf all with probability 1/3. Simon's point IIUC is that all of the quantiles of that distribution are well-defined as elements of the extended real line.

But it seems clear we should make sure there's broad consensus on what the correct behavior of both median and quantile would be before you work on this.

@andreasnoack
Copy link
Member

andreasnoack commented Dec 11, 2016

I don't think there is a good reason to produce the NaN here. The NaN only occurs because we rewrite the interpolation (1-h)*a + h*b as a + h(b - a) so when a and b are Inf it becomes NaN. In the original form, it would have been Inf. The rewrite ensures that the quantiles are monotonous in the probability so we probably want to keep it. I believe that something like a + ifelse(a == b, zero(a), h*(b - a)) should be just fine.

@andreasnoack
Copy link
Member

Adding a check for a==b to https://github.com/JuliaLang/julia/blob/master/base/statistics.jl#L681 is probably the way to go. @alexhallam are you up for preparing a PR?

@alexhallam
Copy link
Contributor

I would love to! Thanks!

@simonbyrne
Copy link
Contributor

simonbyrne commented Dec 15, 2016

While we're at it we should probably make these consistent:

julia> quantile([Inf,1.0],0.5)
Inf

julia> quantile([-Inf,1.0],0.5)
NaN

Need to think of a good way to do this though.

@simonbyrne
Copy link
Contributor

Though that appears to be my fault (#16572 and JuliaStats/StatsBase.jl#164 (comment))

@tkelman
Copy link
Contributor

tkelman commented Dec 19, 2016

is this closed by #19574 or is there any more to do here?

@andreasnoack
Copy link
Member

The issue @simonbyrne mentions is not fixed. I think we might have to check a and b for isfinite and only use the rewritten interpolation when both are true.

@alexhallam
Copy link
Contributor

I agree. That is a good way to solve the problem. I can work on that.

@ararslan
Copy link
Member

Seems to have been fixed by #19659.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

No branches or pull requests

7 participants