Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct R² formulas and add Efron's R² #400

Merged
merged 7 commits into from
Oct 3, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 23 additions & 17 deletions src/statmodels.jl
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,12 @@ Coefficient of determination (R-squared).
For a linear model, the R² is defined as ``ESS/TSS``, with ``ESS`` the explained sum of squares
and ``TSS`` the total sum of squares.
"""
r2(obj::StatisticalModel) = mss(obj) / deviance(obj)
function r2(obj::StatisticalModel)
Base.depwarn("The default r² method for linear models is deprecated. " *
"Packages should define their own methods.", :r2)

mss(obj) / deviance(obj)
end

"""
r2(obj::StatisticalModel, variant::Symbol)
Expand All @@ -207,25 +212,27 @@ Pseudo-coefficient of determination (pseudo R-squared).

For nonlinear models, one of several pseudo R² definitions must be chosen via `variant`.
Supported variants are:
- `:MacFadden` (a.k.a. likelihood ratio index), defined as ``1 - \\log L/\\log L0``.
- `:CoxSnell`, defined as ``1 - (L0/L)^{2/n}``
- `:Nagelkerke`, defined as ``(1 - (L0/L)^{2/n})/(1 - L0^{2/n})``, with ``n`` the number
of observations (as returned by [`nobs`](@ref)).
- `:MacFadden` (a.k.a. likelihood ratio index), defined as ``1 - \\log (L)/\\log (L_0)``;
- `:CoxSnell`, defined as ``1 - (L_0/L)^{2/n}``;
- `:Nagelkerke`, defined as ``(1 - (L_0/L)^{2/n})/(1 - L_0^{2/n})``.

In the above formulas, ``L`` is the likelihood of the model, ``L0`` that of the null model
(the model including only the intercept). These two quantities are taken to be minus half
`deviance` of the corresponding models.
In the above formulas, ``L`` is the likelihood of the model,
``L_0`` is the likelihood of the null model (the model with only an intercept),
``n`` is the number of observations, ``y_i`` are the responses,
``\\hat{y}_i`` are fitted values and ``\\bar{y}`` is the average response.
ararslan marked this conversation as resolved.
Show resolved Hide resolved

Cox and Snell's R² should match the classical R² for linear models.
"""
function r2(obj::StatisticalModel, variant::Symbol)
ll = -deviance(obj)/2
ll0 = -nulldeviance(obj)/2

function r2(obj::StatisticalModel, variant::Symbol)
ll = loglikelihood(obj)
ll0 = nullloglikelihood(obj)
if variant == :McFadden
1 - ll/ll0
elseif variant == :CoxSnell
1 - exp(2/nobs(obj) * (ll0 - ll))
1 - exp(2 * (ll0 - ll) / nobs(obj))
elseif variant == :Nagelkerke
(1 - exp(2/nobs(obj) * (ll0 - ll)))/(1 - exp(2/nobs(obj) * ll0))
(1 - exp(2 * (ll0 - ll) / obs(obj))) / (1 - exp(2 * ll0 / nobs(obj)))
else
error("variant must be one of :McFadden, :CoxSnell or :Nagelkerke")
end
Expand All @@ -252,17 +259,16 @@ adjr2(obj::StatisticalModel) = error("adjr2 is not defined for $(typeof(obj)).")
Adjusted pseudo-coefficient of determination (adjusted pseudo R-squared).

For nonlinear models, one of the several pseudo R² definitions must be chosen via `variant`.
The only currently supported variant is `:MacFadden`, defined as ``1 - (\\log L - k)/\\log L0``.
The only currently supported variant is `:MacFadden`, defined as ``1 - (\\log (L) - k)/\\log (L0)``.
In this formula, ``L`` is the likelihood of the model, ``L0`` that of the null model
(the model including only the intercept). These two quantities are taken to be minus half
`deviance` of the corresponding models. ``k`` is the number of consumed degrees of freedom
of the model (as returned by [`dof`](@ref)).
"""
ararslan marked this conversation as resolved.
Show resolved Hide resolved
function adjr2(obj::StatisticalModel, variant::Symbol)
ll = -deviance(obj)/2
ll0 = -nulldeviance(obj)/2
ll = loglikelihood(obj)
ll0 = nullloglikelihood(obj)
k = dof(obj)

if variant == :McFadden
1 - (ll - k)/ll0
else
Expand Down