Draft of fweights, pweights, aweights for glms #194

jeffwong · 2017-08-13T17:11:18Z

This is a code outline for #186. I am new to contributing to Julia, so please feel free to leave feedback on how this feature should be implemented

jeffwong · 2017-08-13T17:17:56Z

src/linpred.jl

@@ -116,6 +116,8 @@ Base.cholfact!{T}(p::SparsePredChol{T}) = p.chol

 invchol(x::DensePred) = inv(cholfact!(x))
 invchol(x::SparsePredChol) = cholfact!(x) \ eye(size(x.X, 2))
+# TODO: support fweights, pweights, and aweights
+# This vcov represents σ^2  X'X
 vcov(x::LinPredModel) = scale!(invchol(x.pp), dispersion(x, true))


This needs to be updated to invchol(x.pp * sqrt(wts))

jeffwong · 2017-08-13T17:18:37Z

src/linpred.jl

@@ -116,6 +116,8 @@ Base.cholfact!{T}(p::SparsePredChol{T}) = p.chol

 invchol(x::DensePred) = inv(cholfact!(x))
 invchol(x::SparsePredChol) = cholfact!(x) \ eye(size(x.X, 2))
+# TODO: support fweights, pweights, and aweights
+# This vcov represents σ^2  X'X
 vcov(x::LinPredModel) = scale!(invchol(x.pp), dispersion(x, true))


Need to check dispersion(x, true) will use nobs, which will use the right behavior for fweights vs pweights

jeffwong · 2017-08-13T17:20:29Z

src/glmfit.jl

    "`wrkwt`: working case weights for the Iteratively Reweighted Least Squares (IRLS) algorithm"
    wrkwt::V
    "`wrkresid`: working residuals for IRLS"
    wrkresid::V
 end

-function GlmResp{V<:FPVector, D, L}(y::V, d::D, l::L, η::V, μ::V, off::V, wts::V)
+function GlmResp{V<:FPVector, D, L}(y::V, d::D, l::L, η::V, μ::V, off::V, wts::V,


Maybe the constructor doesn't need a wts anymore, just fweights, pweights and aweights

ararslan

Just some preliminary comments, mostly minor style things.

ararslan · 2017-08-13T19:51:39Z

src/lm.jl

        n = length(y)
        length(mu) == n || error("mismatched lengths of mu and y")
        ll = length(off)
        ll == 0 || ll == n || error("length of offset is $ll, must be $n or 0")
        ll = length(wts)
        ll == 0 || ll == n || error("length of wts is $ll, must be $n or 0")
-        new{V}(mu, off, wts, y)
+        ll = length(fweights)
+        ll == 0 || ll == n || error("length of fweights is $ll, must be $n or 0")


In general it's better to use typed exceptions whenever possible rather than error. In this case I think throw(ArgumentError("...")) would be appropriate.

ararslan · 2017-08-13T19:52:31Z

src/glmtools.jl

+    nf = length(fweights)
+    np = length(pweights)
+    na = length(aweights)
+    if (nf != 0 && np == 0 && na == 0)


ifs in Julia don't need outer parentheses, and indeed that style is generally discouraged.

ararslan · 2017-08-13T20:08:28Z

src/glmtools.jl

@@ -98,3 +98,22 @@ loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), Int(y*wt))
 loglik_obs(::Gamma, y, μ, wt, ϕ) = wt*logpdf(Gamma(1/ϕ, μ*ϕ), y)
 loglik_obs(::Normal, y, μ, wt, ϕ) = wt*logpdf(Normal(μ, sqrt(ϕ)), y)
 loglik_obs(::Poisson, y, μ, wt, ϕ) = wt*logpdf(Poisson(μ), y)
+
+# TODO: combine fweights, pweights, aweights
+function normalizeWeights(fweights::FrequencyWeights, pweights::ProbabilityWeights,


Camel case isn't used in Base Julia and the JuliaStats packages tend to match the Base style as closely as possible. For function names, typically it's all lowercase with no separation if the name isn't too long, otherwise snake case. For this function, I'd probably go with normalize_weights.

ararslan · 2017-08-13T20:14:10Z

src/glmtools.jl

+    np = length(pweights)
+    na = length(aweights)
+    if (nf != 0 && np == 0 && na == 0)
+        return fweights |> collect


Another minor stylistic point, but it's generally better to use standard calling conventions rather than piping. In this case you could actually just assign the result of the conditional to a variable (e.g. result = if ...) then return collect(result)

ararslan · 2017-08-13T20:14:43Z

src/linpred.jl

@@ -150,10 +152,11 @@ For linear and generalized linear models, returns the number of rows, or,
 when prior weights are specified, the sum of weights.
 """
 function nobs(obj::LinPredModel)
-    if isempty(obj.rr.wts)
+    if (length(obj.rr.fweights) == 0)


Is isempty not defined for the weight objects in StatsBase?

ararslan · 2017-08-13T20:16:14Z

src/lm.jl

    end
 end
 convert{V<:FPVector}(::Type{LmResp{V}}, y::V) =
-    LmResp{V}(zeros(y), similar(y, 0), similar(y, 0), y)
+    LmResp{V}(zeros(y), similar(y, 0), similar(y, 0), y,
+    similar(y, 0), similar(y, 0), similar(y, 0))


Should be aligned vertically with the arguments above it

ararslan · 2017-08-13T20:17:16Z

src/lm.jl

 type LmResp{V<:FPVector} <: ModResp  # response in a linear model
    mu::V                                  # mean response
    offset::V                              # offset added to linear predictor (may have length 0)
    wts::V                                 # prior weights (may have length 0)
+    fweights::FrequencyWeights
+    pweights::ProbabilityWeights
+    aweights::AnalyticWeights


I'm not sure we should carry around three different types of weights in the response object. Perhaps just a single AbstractWeights?

Yes, we should store the weights internally as a single vector (maybe keep the existing wts field), and only keep track of the necessary additional information like the actual number of observations (which is either the sum of weights or the vector length depending on the weights type).

ararslan · 2017-08-13T20:18:08Z

src/GLM.jl

@@ -14,6 +14,7 @@ module GLM
    import Base: (\), cholfact, convert, cor, show, size
    import StatsBase: coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual, loglikelihood, nullloglikelihood, nobs, stderr, vcov, residuals, predict, fit, model_response, r2, r², adjr2, adjr², PValue
    import StatsFuns: xlogy
+    import StatsBase: FrequencyWeights, ProbabilityWeights, AnalyticWeights


You'll need to bump the lower bound on StatsBase in the REQUIRE file to the version where these types were introduced.

nalimilan

Thanks!

I wonder how useful is the ability to pass multiple weights vector and have GLM combine them automatically. Is that a really common need? Couldn't we require users to do that manually first (and maybe provide convenience functions for that in StatsBase)? Are there examples of existing software which support this? I'm asking because it will make the code more complex and will affect the APIs, so we should evaluate the benefits.

nalimilan · 2017-08-23T10:36:18Z

src/glmtools.jl

+    elseif (nf != 0 && np != 0)
+        return (fweights .* pweights) |> collect
+    else
+        return zeros(Float64, 0)


We should probably handle this more efficiently by returning an empty vector. The existing code uses that trick.

Actually the empty vector gets overridden at some point by ones(n). I haven't checked exactly where or why it was done. I suspect it was so that a quantity could be normalized by sum(weights).

OK. Looks like a good use case for a UnitWeights or Ones type which would implement the AbstractArray interface and always return 1 (JuliaStats/StatsBase.jl#135). But that's not a requirement for a first implementation.

nalimilan · 2017-08-23T10:39:03Z

src/lm.jl

 type LmResp{V<:FPVector} <: ModResp  # response in a linear model
    mu::V                                  # mean response
    offset::V                              # offset added to linear predictor (may have length 0)
    wts::V                                 # prior weights (may have length 0)
+    fweights::FrequencyWeights
+    pweights::ProbabilityWeights
+    aweights::AnalyticWeights


Yes, we should store the weights internally as a single vector (maybe keep the existing wts field), and only keep track of the necessary additional information like the actual number of observations (which is either the sum of weights or the vector length depending on the weights type).

jeffwong · 2017-09-08T03:10:26Z

I think if we only supported Frequency Weights and Probability Weights this PR would be a lot easier for me. I am completely unfamiliar with Analytic Weights, and how they affect formulas for covariance of the glm parameters. It would also mean that we could easily simplify the API by only passing in 1 vector of weights. If someone needed to combine Frequency and Probability together, they could do that using fweight * pweight / sum(fweight * pweight) * sum(fweight). This could then be cast as a Frequency Weight: it has the relative weighting that the pweight brings, and the sum of this new vector still sums to sum(fweight), protecting the use of nobs in the package

jeffwong · 2017-09-08T03:19:40Z

Retrospectively, by limiting the scope to only Fweights and Pweights, I think we only needed a way to define nobs as a function of the weight type. Even if someone wanted to combine Fweights and Pweights, it would ultimately still behave like an Fweight. As long as the cov(parameters) was already based on $X' wts X$ instead of $X'X$ the only modification we would need is to nobs

nalimilan · 2017-09-08T07:13:46Z

Sure, let's go with the simpler version for now. I don't know of other software accepting multiple types of weights at the same time, and we can always add support for analytic weights later.

sergiocorreia · 2017-09-14T12:29:56Z

@jeffwong about your comment on Analytic Weights:

They represent multiple observations, like fweights, but instead of representing multiple identical obs., they represent distinct observations that were averaged out. This happens with variables such as "average grade by classroom", "accident rate per state", etc. Since groups with more individuals have less noisy means, aweights lead to heteroskedasticity.

That said, on linear models pweights are just "aweights with robust standard errors", so if the code supports pweights then aweights come for free.

Nosferican · 2018-04-29T05:44:50Z

What's the status on this?

nalimilan · 2022-09-03T19:07:51Z

Superseded by #487.

jeffwong added 7 commits July 15, 2017 22:53

initial outline

5c58232

augment constructors to have different kinds of weights

39e1266

normalize weights

a010fd1

utilize normalizeWeights in constructor

549045f

update nobs function

b37648e

add to constructor

b1cc501

handle 1 type of weight and multiple types of weights

cc5b5fa

jeffwong commented Aug 13, 2017

View reviewed changes

ararslan reviewed Aug 13, 2017

View reviewed changes

nalimilan reviewed Aug 23, 2017

View reviewed changes

nalimilan mentioned this pull request Sep 7, 2017

Weights in ModelFrame JuliaData/DataFrames.jl#823

Closed

abianco88 mentioned this pull request May 15, 2020

Inconsistencies in Computing Standard Errors in a Weighted Model #371

Closed

nalimilan closed this Sep 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft of fweights, pweights, aweights for glms #194

Draft of fweights, pweights, aweights for glms #194

jeffwong commented Aug 13, 2017

jeffwong Aug 13, 2017

jeffwong Aug 13, 2017

jeffwong Aug 13, 2017

ararslan left a comment

ararslan Aug 13, 2017

ararslan Aug 13, 2017

ararslan Aug 13, 2017

ararslan Aug 13, 2017

ararslan Aug 13, 2017

ararslan Aug 13, 2017

ararslan Aug 13, 2017

nalimilan Aug 23, 2017

ararslan Aug 13, 2017

nalimilan left a comment

nalimilan Aug 23, 2017

dmbates Aug 23, 2017

nalimilan Aug 24, 2017

nalimilan Aug 23, 2017

jeffwong commented Sep 8, 2017

jeffwong commented Sep 8, 2017

nalimilan commented Sep 8, 2017

sergiocorreia commented Sep 14, 2017 •

edited

Loading

Nosferican commented Apr 29, 2018

nalimilan commented Sep 3, 2022

Draft of fweights, pweights, aweights for glms #194

Draft of fweights, pweights, aweights for glms #194

Conversation

jeffwong commented Aug 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ararslan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffwong commented Sep 8, 2017

jeffwong commented Sep 8, 2017

nalimilan commented Sep 8, 2017

sergiocorreia commented Sep 14, 2017 • edited Loading

Nosferican commented Apr 29, 2018

nalimilan commented Sep 3, 2022

sergiocorreia commented Sep 14, 2017 •

edited

Loading