-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling nested structs #85
Comments
Thanks for the nice words! Nested structures are supported but unfortunately undocumented (I'm keeping the issue open to remember to add docs)... When calling
|
BTW, I'm writing a design patterns book about Julia, and I'm going to feature your package in the "Struct of Arrays" pattern. |
Yes, I'm closing #126 because nested structs are not broken and the approach suggested there work. Just ping me when you PR is ready for review (and of course if it turns out that there is a benefit). OTOH, if the current API allows to do what you need but is underdocumented, a doc PR would be great! |
Thanks @piever , In the example above, an array is allocated, and then (I assume) garbage collected after it's used to construct the more efficient StructArray. There's a fair amount of over head in this initial allocation, which I'd like to avoid. Is there a good way to do this? I was trying something like this, which (obviously) doesn't work: julia> using StructArrays
julia> x = (a=[1,2],b=(c=[3,4],d=([5,6],[7,8])))
(a = [1, 2], b = (c = [3, 4], d = ([5, 6], [7, 8])))
julia> StructArray(x; unwrap = T -> T <: Union{Tuple, NamedTuple})
ERROR: BoundsError: attempt to access 0-element Array{Any,1} at index [1:1]
Stacktrace:
[1] copyto!(::Array{Any,1}, ::Int64, ::StructArray{Array{Int64,1},1,NamedTuple{(),Tuple{}},Int64}, ::Int64, ::Int64) at ./abstractarray.jl:905
[2] _widenarray at /home/chad/git/StructArrays.jl/src/collect.jl:126 [inlined]
[3] _widenstructarray at /home/chad/git/StructArrays.jl/src/collect.jl:114 [inlined]
[4] widen_from_type at /home/chad/git/StructArrays.jl/src/collect.jl:109 [inlined]
[5] widen_from_instance(::StructArray{Array{Int64,1},1,NamedTuple{(),Tuple{}},Int64}, ::Int64, ::NamedTuple{(:c, :d),Tuple{Array{Int64,1},Tuple{Array{Int64,1},Array{Int64,1}}}}) at /home/chad/git/StructArrays.jl/src/collect.jl:105
[6] collect_to_structarray!(::StructArray{Array{Int64,1},1,NamedTuple{(),Tuple{}},Int64}, ::NamedTuple{(:a, :b),Tuple{Array{Int64,1},NamedTuple{(:c, :d),Tuple{Array{Int64,1},Tuple{Array{Int64,1},Array{Int64,1}}}}}}, ::Int64, ::Int64) at /home/chad/git/StructArrays.jl/src/collect.jl:77
[7] _collect_structarray! at /home/chad/git/StructArrays.jl/src/collect.jl:59 [inlined]
[8] #_collect_structarray#101 at /home/chad/git/StructArrays.jl/src/collect.jl:54 [inlined]
[9] collect_structarray(::NamedTuple{(:a, :b),Tuple{Array{Int64,1},NamedTuple{(:c, :d),Tuple{Array{Int64,1},Tuple{Array{Int64,1},Array{Int64,1}}}}}}; initializer::StructArrays.StructArrayInitializer{var"#7#8",typeof(StructArrays.default_array)}) at /home/chad/git/StructArrays.jl/src/collect.jl:40
[10] StructArray(::NamedTuple{(:a, :b),Tuple{Array{Int64,1},NamedTuple{(:c, :d),Tuple{Array{Int64,1},Tuple{Array{Int64,1},Array{Int64,1}}}}}}; unwrap::Function) at /home/chad/git/StructArrays.jl/src/structarray.jl:81 |
If storage is already column-based, there should be no allocations at all. In your case, I think the easiest is to define a custom helper function to construct things recursively, for example julia> using StructArrays
julia> to_array(s::AbstractArray) = s
to_array (generic function with 1 method)
julia> to_array(nt::Union{Tuple, NamedTuple}) = StructArray(map(to_array, nt))
to_array (generic function with 2 methods)
julia> x = (a=[1,2],b=(c=[3,4],d=([5,6],[7,8])));
julia> to_array(x)
2-element StructArray(::Array{Int64,1}, StructArray(::Array{Int64,1}, StructArray(::Array{Int64,1}, ::Array{Int64,1}))) with eltype NamedTuple{(:a, :b),Tuple{Int64,NamedTuple{(:c, :d),Tuple{Int64,Tuple{Int64,Int64}}}}}:
(a = 1, b = (c = 3, d = (5, 7)))
(a = 2, b = (c = 4, d = (6, 8)))
julia> using BenchmarkTools
julia> @btime to_array($x);
3.978 ns (0 allocations: 0 bytes) I am not sure this should happen by default. IMO, if the user wants to do things recursively, they should be explicit about it: it's just one extra line of code. If this is not intuitive, maybe one of these examples could appear in the README (it seems like it has tripped up people before). |
Great, thank you! For a while it seemed StructArrays might not have good support for this, so I started working on https://github.com/cscherrer/NestedTuples.jl. My use case requires deep nesting, so that was my focus. It's very rough compared to StructArrays, but for deeply nested structures it still seems to have an advantage. Say we have using StructArrays
using NestedTuples
using Accessors
using BenchmarkTools
to_structarray(s::AbstractArray) = s
to_structarray(nt::Union{Tuple, NamedTuple}) = StructArray(map(to_structarray, nt))
r() = randn(100)
x = (a = r(), b = (c = r(), d= (e = r(), f = (g = r(), h = (i = r(), j = (k = r(), l = (m = r(), n = (o = r(), p = (q = r(), r = (s = r(), t = (u = r(), v = (w = r(), x = (y = r(), z = r()))))))))))))); Then julia> sa = to_structarray(x);
julia> ta = TupleArray(x);
julia> @btime sa = to_structarray($x);
26.352 ns (0 allocations: 0 bytes)
julia> @btime ta = TupleArray($x);
18.108 ns (0 allocations: 0 bytes)
julia> @btime @inbounds $sa[3];
15.360 ns (0 allocations: 0 bytes)
julia> @btime @inbounds $ta[3];
9.406 ns (0 allocations: 0 bytes)
julia> @btime @inbounds $sa[2] = $sa[3];
29.271 ns (0 allocations: 0 bytes)
julia> @btime @inbounds $ta[2] = $ta[3];
20.179 ns (0 allocations: 0 bytes) It's not clear to me where to go form here. Operations like this will be in inner loops, so I need as much speed as I can get. But it seems silly to have two packages with such similar functionality. Would it make sense for some of this to be incorporated into StructArrays? |
Performance improvements are always welcome, so I guess it'd be good to see where the timing difference comes from, and see if the same trick can be used here. These things are a bit tricky though, because when the named tuples get too large, compile times can be an issue. Are you benchmarking the released version of StructArrays or the last commit? There have been some unreleased performance improvements for the nested case (in #141), so master branch should be faster than the released version. |
Good point. Some extra compile time is ok if you can make it up in performance of the compiled code, but there's certainly a tradeoff to keep in mind. I ran benchmarks are on the master branch. I think I may be getting some nice speedups from my julia> schema(ta)
(a = Array{Float64,1}, b = (c = (Array{Float64,1}, Array{Float64,1}), d = Array{Float64,1})) Then julia> NestedTuples.leaf_setter(getfield(ta, :data))
function = (##256, ##257, ##258, ##259;) -> begin
begin
(a = var"##256", b = (c = (var"##257", var"##258"), d = var"##259"))
end
end So you can do julia> NestedTuples.leaf_setter(getfield(ta, :data))(1,2,3,4)
(a = 1, b = (c = (2, 3), d = 4)) I use this as a foundation for most other functions. |
The constructor performance may have been fixed by #162 (just in time :) ). I guess it'd be useful to see what happens when you use a StructArray of tuples, because it looks like you are comparing with a I suspect that
Unfortunately, this is a "low-level" package, so it really should not take extra dependencies, but if there is a smart way to improve performance without adding deps, that would be great. My best guess is that |
Reviving this issue, it is my impression from some simple benchmarks that this feature works fine and is performant and non-allocating for views, which is already great. Eg julia> using StructArrays, BenchmarkTools
julia> sa = StructArray([(x = i, yz = (y = 2*i, z = 3*i)) for i in 1:5000];
unwrap = t -> t <: NamedTuple);
julia> f(sa) = sum(sa.yz.z)
f (generic function with 1 method)
julia> @benchmark f($sa)
BenchmarkTools.Trial: 10000 samples with 234 evaluations.
Range (min … max): 318.333 ns … 431.923 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 324.372 ns ┊ GC (median): 0.00%
Time (mean ± σ): 325.627 ns ± 7.200 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▆▃▁▅█▄ ▁▃
▁▁▄▆▆▆███████████▅▆▇▅▄▄▄▃▃▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
318 ns Histogram: frequency by time 349 ns <
Memory estimate: 0 bytes, allocs estimate: 0. So my understanding is that the only thing that needs to be done to resolve the issue is documentation. It is in the docstring, but should be mentioned in the text of the manual. (If this is the case, I am happy to do a simple docs PR to this great package). |
Hi Pietro. Great package!
Any thought about how to support nested structs? e.g.
I can do it like this:
It's a little bit unsatisfactory since I have to enumerate all non-struct fields (
id
andname
in the above) manually...The text was updated successfully, but these errors were encountered: