Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mean of grouped variables as array #231

Open
phipsgabler opened this issue Aug 11, 2020 · 7 comments
Open

Mean of grouped variables as array #231

phipsgabler opened this issue Aug 11, 2020 · 7 comments

Comments

@phipsgabler
Copy link
Member

When a chain contains, say, x[1] and x[2], then mean(group(chain, :x)) works nicely to see the mean of x. Now I'd have expected Array(mean(group(chain, :x))) to convert it to a vector, just like Array(group(chain, :x)) does, but it doesn't work, as the result of group is a ChainDataFrame which can't be treated this way. Instead you have to do something like

dropdims(mean(Array(group(chain, var)), dims=1), dims=1)

which is much longer and more unreadable. It would be cool to have a shorter variant -- I guess this is a pretty common operation?

Or is there an alternative I have overlooked? In that case, the documentation could be improved.

@cpfiffer
Copy link
Member

You can extract the mean column from the ChainDataFrame directly if everything's sorted correctly:

m = mean(group(chain, var))
m.nt.mean

But I think you've highlighted a bigger issue, which is that we have no ability to reconstruct variables into their original shapes. We've thought about it a lot but have never gotten around to it.

@devmotion
Copy link
Member

BTW I just learnt about JuliaArrays/AxisArrays.jl#182 some days ago (while reviewing the DiffEq tutorial), which is pretty bad - indexing by names will give surprising results (and probably lead to subtle bugs) since the parameters are always ordered according to the original chain instead of the provided vector.

@cpfiffer
Copy link
Member

I'm beginning to not be very fond of AxisArrays. It's not that good a data format for our purposes.

@devmotion
Copy link
Member

Maybe we should consider switching to https://github.com/mcabbott/AxisKeys.jl soon while still thinking about other data formats such as StructArrays which might sometimes be more suitable?

@devmotion
Copy link
Member

@phipsgabler ChainDataFrames implements the Tables interface, so at least it should be easy to convert it to a Vector of NamedTuples or a NamedTuple of Vectors by running Tables.rowtable and Tables.columntable

@phipsgabler
Copy link
Member Author

phipsgabler commented Aug 12, 2020

I think the central issue is more that we're not taking VarNames seriously enough.

Two months ago I was thinking about implementing a kind of trie- or R-tree-based dictionary, to store a mapping from VarNames to scalars and arrays of some arbitrary type -- kind of like a weaker generalization of some aspects of VarInfo.

This was because I always had the same kind of problems, like storing x[1] and x[2] independently and then trying to retrieve just x, let alone something like storeing y[:][1] and retrieve x[1][[1]]. I gave up implementing because I was frustrated, but the idea still exists. This issue isn't the same, but a similar problem which could at least be simplified by some more functionality for VarName, and consistently using it.

Do AxisArrays alow axes with types other than Symbol?

@cpfiffer
Copy link
Member

They do, but AxisArrays are kind of a non-starter for long-term development. They basically require all variables share a type, so including integer and continuous variables tends to make arrays of type Any, which is stupid and not very helpful. It also forces everything to have a linearized shape, which is not generally the shape that parameter draws from hierarchical models take.

I'm all for some kind of VarName implementation, but I'm not quite sure what that would look like right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants