Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thresholds are not sorted due to disproportionate scores #5

Open
janfrancu opened this issue Jan 6, 2021 · 1 comment
Open

Thresholds are not sorted due to disproportionate scores #5

janfrancu opened this issue Jan 6, 2021 · 1 comment

Comments

@janfrancu
Copy link
Collaborator

Computing thresholds on short vectors (by default 300 values or less) may produce unsorted results due to floating point arithmetic errors.

In particular given these lables/scores data.txt, containing values with different magnitudes, the call to quantile at

function thresholds(
scores::RealVector,
n::Int = length(scores);
reduced::Bool = true,
zerorecall::Bool = true,
)
N = reduced ? min(length(scores), n) : n
thres = quantile(scores, range(0, 1, length = N))
if zerorecall
return vcat(thres, nextfloat(thres[end]))
else
return thres
end
end
without any additional argument produces thresholds

 0.3426349461078644                                                                                                   
 0.3456488847732544                                                                                                   
 0.3648797571659088                                                                                                   
 0.3659798502922058                                                                                                   
 0.3748885691165924                                                                                                   
 0.3797685503959656                                                                                                   
 0.38884931802749634                                                                                                  
 0.39463597536087036                                                                                                  
 0.39494413137435913                                                                                                  
 0.40267691016197205                                                                                                  
 0.4065932333469391                                                                                                   
 0.41210252046585083                                                                                                  
 0.41331562399864197                                                                                                  
 0.4157656729221344                                                                                                   
 0.43393915891647333                                                                                                  
 0.4365665018558502                                                                                                   
 0.43950754404067993                                                                                                  
 0.4522964358329773                                                                                                   
 0.454873651266098                                                                                                    
 0.46290522813796997                                                                                                  
 0.490060031414032                                                                                                    
 0.5430693030357361                                                                                                   
 0.5849942564964294                                                                                                   
 0.6120826601982117                                                                                                   
 0.6970598101615906                                                                                                   
 0.8463529348373413                                                                                                   
 0.9729131460189819                                                                                                   
 1.3229070901870714   
-----------------------------------------                                                                                                
 1.083465853229067e8                                                                                  
 1.08346584e8                                                                                                         
 1.10690304e8                                                                                                         
 1.10690304e8                                                                                                         
 1.16699368e8                                                                                                         
 1.16699368e8                                                                                                         
 1.2348864e8                                                                                                          
 1.2348864e8                                                                                                          
 1.54131584e8                                                                                                         
 1.54131584e8                                                                                                         
 1.67473424e8                                                                                                         
 1.67473424e8                                                                                                         
 5.70319872e9                                                                                                         
 5.70319872e9                                                                                                         
 1.5408278528e10                                                                                                      
 1.5408278528e10                                                                                                      
 3.36217088e10
 3.3621708800000824e10
 1.49447049216e11
 1.49447049216e11
 2.8364769329152e16
 2.8364769329152e16

which are sorted with the exception of one element differing from the next by machine eps. Similar issue has been already reported in StatsBase pkg JuliaStats/StatsBase.jl#164 and solved by reordering of arithmetic operations in case of values with similar magnitude. In the case of different magnitudes the problem seems to be even more edge case and may be better to solve it on this end rather than rely on changes to base Julia.

@nalimilan
Copy link

I tried reproducing the problem and I couldn't, on both Julia 1.6 and 1.9. What am I doing differently?

scores = [2.836477e16, 2.836477e16, 1.4944705e11, 1.4944705e11, 3.3621709e10, 3.3621709e10, 1.5408279e10, 1.5408279e10, 5.7031987e9, 5.7031987e9, 1.6747342e8, 1.6747342e8, 1.5413158e8, 1.5413158e8, 1.2348864e8, 1.2348864e8, 1.1669937e8, 1.1669937e8, 1.106903e8, 1.106903e8, 1.08346584e8, 1.08346584e8, 1.3229071, 0.97291315, 0.84635293, 0.6970598, 0.61208266, 0.58499426, 0.5430693, 0.49006003, 0.46290523, 0.45487365, 0.45229644, 0.43950754, 0.4365665, 0.43393916, 0.41576567, 0.41331562, 0.41210252, 0.40659323, 0.4026769, 0.39494413, 0.39463598, 0.38884932, 0.37976855, 0.37488857, 0.36597985, 0.36487976, 0.34564888, 0.34263495]

thresholds(scores)
quantile(scores, range(0, 1, length=length(scores)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants