Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why ContrastsMatrix matrix is Matrix{Float64}? #251

Open
PharmCat opened this issue Jan 3, 2022 · 7 comments
Open

Why ContrastsMatrix matrix is Matrix{Float64}? #251

PharmCat opened this issue Jan 3, 2022 · 7 comments

Comments

@PharmCat
Copy link

PharmCat commented Jan 3, 2022

Why matrix field of struct ContrastsMatrix is Matrix{Float64}? For many cases fo DummyCoding() or FullDummyCoding() this can be BitMatrix or SparseMatrixCSC{Bool, Int64}.
For big datasets I try to make something like this:

mutable struct OwnDummyCoding <: AbstractContrasts
# Dummy contrasts 
end
function StatsModels.contrasts_matrix(C::OwnDummyCoding, baseind, n)
    sparse(I, n, n)[:, [1:(baseind-1); (baseind+1):n]]
end

But I have memory overflow because ContrastsMatrix tries to convert this to Matrix{Float64}.

@PharmCat
Copy link
Author

PharmCat commented Jan 3, 2022

Is it possible to make:

struct ContrastsMatrix{C <: AbstractContrasts, T, U, M}
    matrix::M
    termnames::Vector{U}
    levels::Vector{T}
    contrasts::C
    invindex::Dict{T,Int}
    function ContrastsMatrix(matrix::M,
                             termnames::Vector{U},
                             levels::Vector{T},
                             contrasts::C) where {U,T,C <: AbstractContrasts} where M <: AbstractMatrix
        allunique(levels) || throw(ArgumentError("levels must be all unique, got $(levels)"))
        invindex = Dict{T,Int}(x=>i for (i,x) in enumerate(levels))
        new{C,T,U,M}(matrix, termnames, levels, contrasts, invindex)
    end
end

@palday
Copy link
Member

palday commented May 19, 2022

@PharmCat how many contrast levels do you have? If this is for the grouping variable in MixedModels.jl, then there is the Grouping() pseudocontrast which avoids creating an actual matrix

@PharmCat
Copy link
Author

PharmCat commented May 20, 2022

@PharmCat how many contrast levels do you have? If this is for the grouping variable in MixedModels.jl, then there is the Grouping() pseudocontrast which avoids creating an actual matrix

@palday

Hello! It can be more than 10^5. Actually I'am working on Metida.jl, that helps me in some tasks where MixedModels.jl can't be used. I know that in MixedModels this problem solved, Metida have some "workaround" too. And I see 'Grouping' in MixedModels.jl and may be 'Grouping' code should be moved to StatsModels.jl and documented there (may be with some other code from MixedModels, such using "/" in terms).
Also I don't know why ContrastsMatrix matrix field set as Matrix{Float64}, why in can't be more flexible.

So also I can't find any roadmap for StatsModels, I think StatsModels is a core package for JuliaStats ecosystem, but have no information about it's development plan to version 1.0

@palday
Copy link
Member

palday commented May 20, 2022

The nesting syntax / is implemented in RegressionFormulae.jl

@palday
Copy link
Member

palday commented May 20, 2022

The implementation of Grouping() is quite simple: https://github.com/JuliaStats/MixedModels.jl/blob/621f88b1f594ea0827d9ac7e8628113dd2121bef/src/grouping.jl#L2-L34

Depending on the exact structure of your model, you might be able to skip using the full formula infrastructure and instead call a custom modelcols method directly -- this is how random effects and associated sparse matrices are constructed in MixedModels.

@PharmCat
Copy link
Author

The implementation of Grouping() is quite simple:

Yep, but this means that I should copy this code or include MixedModels as a dependency. Maybe place this functionality in StatsModels?

@palday
Copy link
Member

palday commented Jun 28, 2022

There's nothing wrong with copying this code, but maybe @kleinschmidt has thoughts on whether it makes more general sense to include this in StatsModels?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants