Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft of fweights, pweights, aweights for glms #194

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/GLM.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ module GLM
import Base: (\), cholfact, convert, cor, show, size
import StatsBase: coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual, loglikelihood, nullloglikelihood, nobs, stderr, vcov, residuals, predict, fit, model_response, r2, r², adjr2, adjr², PValue
import StatsFuns: xlogy
import StatsBase: FrequencyWeights, ProbabilityWeights, AnalyticWeights
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to bump the lower bound on StatsBase in the REQUIRE file to the version where these types were introduced.

export coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual, loglikelihood, nobs, stderr, vcov, residuals, predict, fit, fit!, model_response, r2, r², adjr2, adjr²

export # types
Expand Down
14 changes: 12 additions & 2 deletions src/glmfit.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# TODO: expand weights to include fweights, pweights, aweights
"""
GlmResp

Expand All @@ -17,13 +18,19 @@ immutable GlmResp{V<:FPVector,D<:UnivariateDistribution,L<:Link} <: ModResp
offset::V
"`wts:` prior case weights. Can be of length 0."
wts::V

fweights::FrequencyWeights
pweights::ProbabilityWeights
aweights::AnalyticWeights

"`wrkwt`: working case weights for the Iteratively Reweighted Least Squares (IRLS) algorithm"
wrkwt::V
"`wrkresid`: working residuals for IRLS"
wrkresid::V
end

function GlmResp{V<:FPVector, D, L}(y::V, d::D, l::L, η::V, μ::V, off::V, wts::V)
function GlmResp{V<:FPVector, D, L}(y::V, d::D, l::L, η::V, μ::V, off::V, wts::V,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the constructor doesn't need a wts anymore, just fweights, pweights and aweights

fweights::FrequencyWeights, pweights::ProbabilityWeights, aweights::AnalyticWeights)
if d == Binomial()
for yy in y
0. <= yy <= 1. || throw(ArgumentError("$yy in y is not in [0,1]"))
Expand All @@ -38,7 +45,10 @@ function GlmResp{V<:FPVector, D, L}(y::V, d::D, l::L, η::V, μ::V, off::V, wts:
"lengths of η, μ, y and wts ($nη, $nμ, $(length(wts)), $n) are not equal"))
lo = length(off)
lo == 0 || lo == n || error("offset must have length $n or length 0")
res = GlmResp{V,D,L}(y, d, similar(y), η, μ, off, wts, similar(y), similar(y))
res = GlmResp{V,D,L}(y, d, similar(y), η, μ, off,
normalizeWeights(fweights, pweights, aweights),
similar(y), similar(y),
fweights, pweights, aweights)
updateμ!(res, η)
res
end
Expand Down
19 changes: 19 additions & 0 deletions src/glmtools.jl
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,22 @@ loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), Int(y*wt))
loglik_obs(::Gamma, y, μ, wt, ϕ) = wt*logpdf(Gamma(1/ϕ, μ*ϕ), y)
loglik_obs(::Normal, y, μ, wt, ϕ) = wt*logpdf(Normal(μ, sqrt(ϕ)), y)
loglik_obs(::Poisson, y, μ, wt, ϕ) = wt*logpdf(Poisson(μ), y)

# TODO: combine fweights, pweights, aweights
function normalizeWeights(fweights::FrequencyWeights, pweights::ProbabilityWeights,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Camel case isn't used in Base Julia and the JuliaStats packages tend to match the Base style as closely as possible. For function names, typically it's all lowercase with no separation if the name isn't too long, otherwise snake case. For this function, I'd probably go with normalize_weights.

aweights::AnalyticWeights)
nf = length(fweights)
np = length(pweights)
na = length(aweights)
if (nf != 0 && np == 0 && na == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ifs in Julia don't need outer parentheses, and indeed that style is generally discouraged.

return fweights |> collect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another minor stylistic point, but it's generally better to use standard calling conventions rather than piping. In this case you could actually just assign the result of the conditional to a variable (e.g. result = if ...) then return collect(result)

elseif (nf == 0 && np != 0 && na == 0)
return pweights |> collect
elseif (nf == 0 && np == 0 && na != 0)
return aweights |> collect
elseif (nf != 0 && np != 0)
return (fweights .* pweights) |> collect
else
return zeros(Float64, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably handle this more efficiently by returning an empty vector. The existing code uses that trick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the empty vector gets overridden at some point by ones(n). I haven't checked exactly where or why it was done. I suspect it was so that a quantity could be normalized by sum(weights).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Looks like a good use case for a UnitWeights or Ones type which would implement the AbstractArray interface and always return 1 (JuliaStats/StatsBase.jl#135). But that's not a requirement for a first implementation.

end
end
7 changes: 5 additions & 2 deletions src/linpred.jl
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,8 @@ Base.cholfact!{T}(p::SparsePredChol{T}) = p.chol

invchol(x::DensePred) = inv(cholfact!(x))
invchol(x::SparsePredChol) = cholfact!(x) \ eye(size(x.X, 2))
# TODO: support fweights, pweights, and aweights
# This vcov represents σ^2 X'X
vcov(x::LinPredModel) = scale!(invchol(x.pp), dispersion(x, true))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated to invchol(x.pp * sqrt(wts))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check dispersion(x, true) will use nobs, which will use the right behavior for fweights vs pweights


function cor(x::LinPredModel)
Expand Down Expand Up @@ -150,10 +152,11 @@ For linear and generalized linear models, returns the number of rows, or,
when prior weights are specified, the sum of weights.
"""
function nobs(obj::LinPredModel)
if isempty(obj.rr.wts)
if (length(obj.rr.fweights) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is isempty not defined for the weight objects in StatsBase?

oftype(sum(one(eltype(obj.rr.wts))), length(obj.rr.y))
else
sum(obj.rr.wts)
sum(obj.rr.fweights) # This is the behavior for fweights
# TODO: add support for pweights and aweights
end
end

Expand Down
21 changes: 18 additions & 3 deletions src/lm.jl
Original file line number Diff line number Diff line change
@@ -1,20 +1,35 @@
# TODO: expand weights to include fweights, pweights, aweights
# Example LmResp{Array{Float64,1}}(randn(5), randn(5), randn(5), randn(5))
type LmResp{V<:FPVector} <: ModResp # response in a linear model
mu::V # mean response
offset::V # offset added to linear predictor (may have length 0)
wts::V # prior weights (may have length 0)
fweights::FrequencyWeights
pweights::ProbabilityWeights
aweights::AnalyticWeights
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should carry around three different types of weights in the response object. Perhaps just a single AbstractWeights?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should store the weights internally as a single vector (maybe keep the existing wts field), and only keep track of the necessary additional information like the actual number of observations (which is either the sum of weights or the vector length depending on the weights type).

y::V # response
function (::Type{LmResp{V}}){V}(mu::V, off::V, wts::V, y::V)
function (::Type{LmResp{V}}){V}(mu::V, off::V, wts::V, y::V,
fweights::FrequencyWeights, pweights::ProbabilityWeights,
aweights::AnalyticWeights)
n = length(y)
length(mu) == n || error("mismatched lengths of mu and y")
ll = length(off)
ll == 0 || ll == n || error("length of offset is $ll, must be $n or 0")
ll = length(wts)
ll == 0 || ll == n || error("length of wts is $ll, must be $n or 0")
new{V}(mu, off, wts, y)
ll = length(fweights)
ll == 0 || ll == n || error("length of fweights is $ll, must be $n or 0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it's better to use typed exceptions whenever possible rather than error. In this case I think throw(ArgumentError("...")) would be appropriate.

ll = length(pweights)
ll == 0 || ll == n || error("length of pweights is $ll, must be $n or 0")
ll = length(aweights)
ll == 0 || ll == n || error("length of aweights is $ll, must be $n or 0")
new{V}(mu, off, normalizeWeights(fweights, pweights, aweights), y,
fweights, pweights, aweights)
end
end
convert{V<:FPVector}(::Type{LmResp{V}}, y::V) =
LmResp{V}(zeros(y), similar(y, 0), similar(y, 0), y)
LmResp{V}(zeros(y), similar(y, 0), similar(y, 0), y,
similar(y, 0), similar(y, 0), similar(y, 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be aligned vertically with the arguments above it


function convert{T<:Real}(::Type{LmResp}, y::AbstractVector{T})
yy = float(y)
Expand Down