Fix overflows in `quantile` #145

nalimilan · 2023-07-01T14:56:32Z

The a + γ*(b-a) introduced by JuliaLang/julia#16572 has the advantage that it increases with γ even when a and b are very close, but it has the drawback that it is not robust to overflow. This is likely to happen in practice with small integer and floating point types.

Conversely, the (1-γ)*a + γ*b which is currently used only for non-finite quantities is robust to overflow but may not always increase with γ as when a and b are very close or (more frequently) equal since precision loss can give a slightly smaller value for a larger γ. This can be problematic as it breaks an expected invariant.

So keep using the a + γ*(b-a) formula when a ≈ b, in which case it's almost like returning either a or b but less arbitrary.

Fixes #144.

The `a + γ*(b-a)` introduced by JuliaLang/julia#16572 has the advantage that it increases with `γ` even when `a` and `b` are very close, but it has the drawback that it is not robust to overflow. This is likely to happen in practice with small integer and floating point types. Conversely, the `(1-γ)*a + γ*b` which is currently used only for non-finite quantities is robust to overflow but may not always increase with `γ` as when `a` and `b` are very close or (more frequently) equal since precision loss can give a slightly smaller value for a larger `γ`. This can be problematic as it breaks an expected invariant. So keep using the `a + γ*(b-a)` formula when `a ≈ b`, in which case it's almost like returning either `a` or `b` but less arbitrary.

codecov-commenter · 2023-07-01T14:58:45Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01% 🎉

Comparison is base (bb7063d) 96.98% compared to head (000d4c1) 96.99%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #145      +/-   ##
==========================================
+ Coverage   96.98%   96.99%   +0.01%     
==========================================
  Files           1        1              
  Lines         431      433       +2     
==========================================
+ Hits          418      420       +2     
  Misses         13       13

Files Changed	Coverage Δ
src/Statistics.jl	`96.99% <100.00%> (+0.01%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bkamins

It required also testing if the function is non-decreasing if we increase b and switch the formula, but I tested it and it holds.

nalimilan · 2023-07-01T22:06:37Z

It required also testing if the function is non-decreasing if we increase b and switch the formula, but I tested it and it holds.

It's already covered by the test added a long time ago by JuliaLang/julia#16572. That's how I realized the problem. ;-)

EDIT: You mean γ, not b?

bkamins · 2023-07-02T09:32:34Z

In general I mean that it should be monotonic in a, b and γ and I checked all. The tests did not cover all cases fully. The reason is that e.g. you need to test case when float(a) ≈ float(b) but !(float(a) ≈ nextfloat(float(b))) (and similarly !(prevfloat(float(a)) ≈ float(b)) for monotonicity for various γ; I think it is not covered but I checked it).

nalimilan · 2023-07-28T20:55:38Z

OK. So you mean two tests like this are needed?

    @test issorted(quantile([1.0, 1.0+eps(), 1.0+2eps(), 1.0+3eps()], range(0, 1, length=100)))
    @test issorted(quantile([1.0, 1.0+2eps(), 1.0+4eps(), 1.0+6eps()], range(0, 1, length=100)))

bkamins · 2023-07-28T23:57:07Z

Yes - something like this (this is not strictly needed 😄, but I run such tests and they were OK).

Before #145 `Date` and `DateTime` were supported with `quantile` as long as the cut point falls between two equal values. Restore this behavior as some code may rely on this given that it is the most common situation with large datasets.

nalimilan requested review from andreasnoack and bkamins July 1, 2023 14:56

Fix 32-bit failure

ca0d01f

bkamins approved these changes Jul 1, 2023

View reviewed changes

Add tests

000d4c1

nalimilan merged commit 35ca0a0 into master Jul 29, 2023
11 checks passed

nalimilan deleted the nl/quantile branch July 29, 2023 21:32

nalimilan mentioned this pull request Sep 30, 2023

The quantile function can return incorrect results for integer arrays (Int8, Int16, Int32) #119

Closed

nalimilan mentioned this pull request Oct 2, 2023

Fix quantile with Date and DateTime #153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix overflows in `quantile` #145

Fix overflows in `quantile` #145

nalimilan commented Jul 1, 2023

codecov-commenter commented Jul 1, 2023 •

edited

Loading

bkamins left a comment

nalimilan commented Jul 1, 2023 •

edited

Loading

bkamins commented Jul 2, 2023

nalimilan commented Jul 28, 2023

bkamins commented Jul 28, 2023

Fix overflows in quantile #145

Fix overflows in quantile #145

Conversation

nalimilan commented Jul 1, 2023

codecov-commenter commented Jul 1, 2023 • edited Loading

Codecov Report

bkamins left a comment

Choose a reason for hiding this comment

nalimilan commented Jul 1, 2023 • edited Loading

bkamins commented Jul 2, 2023

nalimilan commented Jul 28, 2023

bkamins commented Jul 28, 2023

Fix overflows in `quantile` #145

Fix overflows in `quantile` #145

codecov-commenter commented Jul 1, 2023 •

edited

Loading

nalimilan commented Jul 1, 2023 •

edited

Loading