-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with reading branches of C-style arrays #165
Comments
Follow-up: ...
else
#offset = rawoffsets # not needed since set implicitly in return
real_data = ntoh.(reinterpret(T, rawdata))
return VectorOfVectors(real_data, Int32[1])
end gives kind of a weird behavior for the lazy tree. file = ROOTFile("path/to/file")
lt = LazyTree(file,"tree",["branch"]) # branch contains 30 entries with each arrays of the Root-type `[9]/D` Outputs: julia> lt[1]
UnROOT.LazyEvent at index 1 with 1 columns:
(branch = [-0.11494079823987098, 0.003512754908587555, -0.006951244327531138, -0.010983292976212498, 0.0012710038728507352, -0.009022733771010032, -0.007430790262943545, -0.001381307906738535, 0.12396353201088427, 6.95206001540604e-310 … NaN, NaN, 6.95214518845505e-310, 6.952094284304e-310, 1.0e-323, 7.94302e-318, 1.0e-323, 4.0e-323, 0.0, 5.0e-324],)
# contains the correct data but is then padded at the end by garbage (probably Vector{Float64}(undef,N) is somewhere called?)
julia> lt.branch[1]
0-element view(::Vector{Float64}, 1:0) with eltype Float64
# Not accessible this way
julia> lt
Row │ branch ⋯
│ SubArray{Float6 ⋯
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ [-0.115, 0.00351, -0.00695, -0.011, 0.00127, -0.00902, -0.00743, -0.00138, 0.124, 6.95205899055105e-310, 6.95205899024354e-310, 5. ⋯
2 │ [] ⋯
3 │ [] ⋯
4 │ [-0.106, 0.000901, -0.00758, -0.00825, 0.00072, -0.0106, -0.00998, -0.00164, 0.116, 6.95205899057635e-310, 6.95205904360066e-310, ⋯
5 │ [-0.105, 0.0009, -0.00814, -0.00801, 0.000957, -0.00985, -0.0104, -0.00249, 0.115, 6.9521451199374e-310, 6.9520570656116e-310, 6.9 ⋯
6 │ [] ⋯
7 │ [] ⋯
8 │ [-0.0976, 0.00202, -0.00855, -0.00792, 0.00172, -0.0115, -0.011, -0.0022, 0.109, 6.9521451199374e-310, 6.952057148036e-310, 6.9520 ⋯
⋮ │ ⋮ ⋱
1 column and 22 rows omitted I guess the fact that not all elements are filled here is due to the tree being loaded |
Thanks for the investigations @briederer, that certainly helps a bit :) A few things has changed since #9 so we might be closer to solving this issue. Can you upload a small test file and some reference values? |
Hey thanks for the response.
What is interesting is, that although this file is now filled in an analogous way compared to my real data, it behaves a bit different and I guess this is due to basket-separation. In my original files I flush after every When I pass this new file to my the modified version of UnROOT (see previous comment) I see again that the arrays are of type
Let me know if I can help any further. PS: Sorry for adding the ROOT-file as Reference values:
|
OK thanks, the good news is that uproot can read everything In [1]: import uproot
In [2]: f = uproot.open("sample.root")
In [3]: f.keys()
Out[3]: ['arrays;1']
In [4]: f['arrays']
Out[4]: <TTree 'arrays' (3 branches) at 0x0001073d9bb0>
In [5]: f['arrays'].keys()
Out[5]: ['n', 'carr', 'cstr']
In [6]: f['arrays/carr']
Out[6]: <TBranch 'carr' at 0x00011a493940>
In [7]: f['arrays/carr'].array()
Out[7]: <Array [[0.961, 0.578, ... 0.0711, 0.723]] type='3 * 9 * float64'>
In [8]: f['arrays/cstr'].array()
Out[8]: <Array [[0.701, 0.588, ... 0.493, 0.668]] type='3 * 9 * float64'> I'll try to figure out what's going on. Regarding the number of baskets: the size of the data is also relevant. With little data, you often end up with a single basket, so we often try to create sample files where the data is spread across multiple baskets, since stitching the data together might be another issue. May I ask you to generate a bit more data to force multiple baskets? Meanwhile I'll see what I can do... |
I've created another file where I force the same structure as in my file (i.e. writing after each filling). |
Just tried it out and this file does not segfault with my edit. And I guess you don't the new reference values since uproot is able to read it @tamasgal? |
Alright, thanks! Yes, I'll use uproot for cross-checks |
OK, digging deeper reveals that the problem is that Here you can see that the leaf-information for the
I now need to think about how to get this hammered into the existing auto-jagg-construct. The place where this could be handled is here: Lines 394 to 397 in 81b6ca9
Just a dummy placeholder: Here I need to figure out how to pass over the information that we now need to read 9 elements in something like a |
If I get the code right the In the utils.jl code NooffsetJagg is assigned to branches where the Line 61 in 81b6ca9
Especially this type is used nowhere else in the code so I guess this one may be used for that? However how to add the Uproot4 is extracting the info here Maybe something of this helps. |
I think in this case you just need _type = SVector{_type, 9}
_jaggtype = NooffsetJagg
return _type, _jaggtype |
Yes, that's the plan, and then propagate the dimension parameter down the stream. I'll check it out later after work. |
Oh wait, would this work like that? That would make thinks fairly easy. 😆 |
I hope the downstream usage of |
OK not the fully story but it's a start of a hopefully short journey.
|
oh it's because |
ah ok then after fixing that we need to tweak |
Yep, that's tricky. I had a long day, so I cannot spend as much as I thought I can on it, but I'll squeeze out a few more minutes until I fall into bed. Tomorrow is another day ;) |
OK, so of course, we run into the issue that julia> raw, offsets = UnROOT.array(f, "arrays/carr"; raw=true)
(UInt8[0x3f, 0xda, 0xe5, 0x2b, 0x15, 0x35, 0xca, 0x56, 0x3f, 0xa5 … 0x50, 0xf4, 0x3f, 0xed, 0xc1, 0x31, 0x14, 0x3b, 0x82, 0x62], Int32[0])
julia> reinterpret(StaticArrays.SVector{Float64, 9}, raw[1:8])
ERROR: ArgumentError: cannot reinterpret `UInt8` as `SVector{Float64, 9}`, type `SVector{Float64, 9}` is not a bits type
Stacktrace:
[1] (::Base.var"#throwbits#277")(S::Type, T::Type, U::Type)
@ Base ./reinterpretarray.jl:16
[2] reinterpret(#unused#::Type{SVector{Float64, 9}}, a::Vector{UInt8})
@ Base ./reinterpretarray.jl:36
[3] top-level scope
@ REPL[22]:1 Here of course we also need to reverse the bytes, due to the big-endianness of ROOT But as seen here, the data can be accessed:
cf In [13]: f['arrays/carr'].array()[0][0]
Out[13]: 0.4202373225336137 |
https://juliaarrays.github.io/StaticArrays.jl/stable/pages/api/#Arrays-of-static-arrays you have to go to |
Thanks for the progress on this already. Seems very promising so far. May I ask if one of you @tamasgal or @Moelf could push the current state of the modifications (although not working yet) to #166? |
I just pushed, but it's not much. Sorry it's taking so long, I am a bit overloaded... I'll spend a bit time on it today! |
Thanks for that. |
Alright, thanks for the great feedback too :) I am glad that @Moelf jumped into the project, he did an enormous amount of work already :) Btw. a few thoughts which might help you: the current issue is to hook into the interpretation of an |
Essentially instead of A long-term solution would probably be the replacement of |
btw all of your type notation is flipped, it's SVector{N, T}
NTuple{N, T} |
julia> ROOTFile("./test/samples/issue165_multiple_baskets.root")["arrays/carr"]
3-element LazyBranch{UnROOT.FixLenVector{9, Float64}, UnROOT.Nooffsetjagg, Vector{UnROOT.FixLenVector{9, Float64}}}:
[0.4202373225336137, 0.04243914226183628, 0.5403969290388734, 0.5762304009759008, 0.5129559252005796, 0.2988885591267089, 0.866322138284483, 0.3651808394888327, 0.14771760168844722]
[0.189585121902444, 0.5142614690187673, 0.5731252634772683, 0.7441941270344863, 0.018536993776698128, 0.5783476343277598, 0.12047688202954683, 0.981035016933938, 0.24272876151033154]
[0.7775048011809144, 0.8664217530127716, 0.4918492038230641, 0.24464299401484568, 0.38991686533667, 0.15690925771226608, 0.3850047958013624, 0.9268160513261408, 0.9298329730191421] I basically got it working, but the inner thing being a vector makes my head spin |
oops yeah, error propagation 😆 that looks good, so you went the thin-wrapper approach... |
Re: #165 (comment) wait, the file you provided is not |
ehh, ok maybe we should think of it as |
Yeah, it's a bit confusing but I think |
Wait no I think "no offset" means something else, it means no 10 bytes offset between rawdata and raw offsets. It does NOT mean there's no offset bytes. If there's no offset bytes (for Jagged array), it's Nojagg. I'm inclined to treat this as Nojagg since element length is known to compiler |
OK you are right! The 10 byte offset usually comes from |
@briederer I've updated this PR: can you try it with your real data? I've added tests for the sample file you provided |
What I meant by "it is a Re test with real data: will do it in a second. |
All that I've tested now works like a charm 🥳 Only weird behavior I saw is that when I call the following in my script: file = UnROOT.samplefile("issue165_multiple_baskets.root")
file["arrays/n"] # independent of the chosen branch that additionally to usual output a single colon Line 82 in 0084d1a
should specify the IO where to print the colon (; Apart from that I think this issue is resolved. I only have one more feature request related to that issue, but this may be added separately, since it may be involved. (; Again thanks for the great and fast work. |
ah nice catch, ok I will merge and tag a release soon |
Nice work as alwasy :) |
I just tried to read one of my Root-files which contains a TTree with TBranches of the following types
double mat[9]
In principle it seems to be related to my old issue #9 so I decided to dig a bit deeper with the current version.
Loading my file with
ROOTFile()
works without a problem. However when trying to create theLazyTree
of the said TTree I get the error message:The stacktrace lead me to the
interped_data(rawdata, rawoffsets, ::Type{T}, ::Type{J}) where {T, J<:JaggType}
function.It turns out that my data seems to be a
NooffsetJag
JagType which seems kind of reasonable to me. Also reinterpreting works without a problem and indeed the variablereal_data
contains the real data stored in the Root-File.So in fact the problem only occurs when passing the real_data and the offset to
VectorOfVectors
. The problem seems to be the following: In my case the offset-vector is simply an empty vectorInt32[]
which causes the VectorOfVectors to call the functionsimilar
to init a vector of size-1
.So to sum it up, it seems that UnROOT is able to read my data (and I guess in turn also the file from #9) but simply the wrapping of the data fails. I guess it should be possible to fix it by changing the part for
NooffsetJag
in the code and catch cases when rawoffsets is empty. But I don't know whether this would create other problems.I really hope we can figure this out, because being able to finally use UnROOT for my analysis would make my analysis much more efficient.
The text was updated successfully, but these errors were encountered: