Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix numerical instability with high values of the degrees of freedom of the t-distribution #166

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions src/distrs/fdist.jl
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,17 @@ end
for f in ("invcdf", "invccdf", "invlogcdf", "invlogccdf")
ff = Symbol("fdist"*f)
bf = Symbol("beta"*f)
cf = Symbol("chisq"*f)
@eval function $ff(ν1::T, ν2::T, y::T) where {T<:Real}
x = $bf(ν1/2, ν2/2, y)
return x/(1 - x)*ν2/ν1
# the evaluation of the CDF with the t-distribution combined with the
# beta distribution is unstable for high ν2,
# therefore, we switch from the beta to the chi-squared distribution at ν > 1.0e7
if ν2 > 1.0e7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this cutoff has to be type-dependent, ie whether a different value is needed for Float32 or Float64. Can you check the current output for these number types as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is comparison between using the threshold and not using it. Apparently, there is no big difference between calculating with Float32 and Float64. We could set the threshold a bit higher but smaller than 10^8.

I am a bit wondering why the quantile function is not smooth.

ν_f64 = 10.0 .^ LinRange(6, 11, 1000)
ν_f32 = Float32.(ν_f64)

d_f64 = TDist.(ν_f64)
d_f32 = TDist.(ν_f32)


qs = [0.75, 0.8, 0.9, 0.95, 0.99]


before_f64 = Vector{Float64}[]
before_f32 = Vector{Float32}[]

for q in qs
    push!(before_f64, quantile.(d_f64, q))
    push!(before_f32, quantile.(d_f32, q))
end

################# "fix" the quantile function
after_f64 = Vector{Float64}[]
after_f32 = Vector{Float32}[]

for q in qs
    push!(after_f64, quantile.(d_f64, q))
    push!(after_f32, quantile.(d_f32, q))
end


begin
    fig = Figure(size = (500, 1300))
    for i in eachindex(qs)

        ax = Axis(fig[i,1]; ylabel = "$(qs[i]) quantile",
            xlabel = "degrees of freedom ν",
            xscale = log10,
            xlabelvisible = i == length(qs) ? true : false)
        lines!(ν_f64, before_f64[i]; label = "before f64")
        lines!(ν_f32, before_f32[i]; label = "before f32")
        lines!(ν_f64, after_f64[i]; label = "after f64")
        lines!(ν_f32, after_f32[i]; label = "after f32")
        if i == length(qs)
            axislegend(ax)
        end
    end

    display(fig)
end

quantiles

return $cf(ν1, y)/ν1
else
x = $bf(ν1/2, ν2/2, y)
return x/(1 - x)*ν2/ν1
end
end
@eval $ff(ν1::Real, ν2::Real, y::Real) = $ff(promote(ν1, ν2, y)...)
end
4 changes: 3 additions & 1 deletion src/distrs/tdist.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ tdistpdf(ν::Real, x::Real) = exp(tdistlogpdf(ν, x))

tdistlogpdf(ν::Real, x::Real) = tdistlogpdf(promote(ν, x)...)
function tdistlogpdf(ν::T, x::T) where {T<:Real}
isinf(ν) && return normlogpdf(x)
# the logpdf equation of the t-distribution is numerically unstable for high ν,
# therefore we switch at ν > 1.0e7 to the normal distribution
ν > 1.0e7 && return normlogpdf(x)
Comment on lines +7 to +9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, what's the current output (without the change) for e.g. Float32 and Float64?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the same analysis for the log pdf. It seems that there is no best value for the threshold because the numerical instability can start from 10 ^ 7.2 for the log pdf of 0.5, but for higher x values it is better to have a higher threshold value.

ν_f64 = 10.0 .^ LinRange(6.5, 8.5, 1000)
ν_f32 = Float32.(ν_f64)

d_f64 = TDist.(ν_f64)
d_f32 = TDist.(ν_f32)


xs = [0.1, 0.5, 10.0, 20.0]


before_f64 = Vector{Float64}[]
before_f32 = Vector{Float32}[]

for x in xs
    push!(before_f64, logpdf.(d_f64, x))
    push!(before_f32, logpdf.(d_f32, x))
end

################# "fix" the quantile function
after_f64 = Vector{Float64}[]
after_f32 = Vector{Float32}[]

for x in xs
    push!(after_f64, logpdf.(d_f64, x))
    push!(after_f32, logpdf.(d_f32, x))
end


begin
    fig = Figure(size = (500, 800))
    for i in eachindex(xs)

        ax = Axis(fig[i,1]; ylabel = "logpdf of $(xs[i])",
            xlabel = "degrees of freedom ν",
            xscale = log10,
            xlabelvisible = i == length(xs) ? true : false)
        lines!(ν_f64, before_f64[i]; label = "before f64")
        lines!(ν_f32, before_f32[i]; label = "before f32")
        lines!(ν_f64, after_f64[i]; label = "after f64")
        lines!(ν_f32, after_f32[i]; label = "after f32")
        if i == length(xs)
            axislegend(ax)
        end
    end

    display(fig)
end

logpdf

νp12 = (ν + 1) / 2
return loggamma(νp12) - (logπ + log(ν)) / 2 - loggamma(ν / 2) - νp12 * log1p(x^2 / ν)
end
Expand Down
Loading