Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement Table.partition and test Arrow round-trips for LazyTree #202

Merged
merged 2 commits into from
Jan 6, 2023

Conversation

Moelf
Copy link
Member

@Moelf Moelf commented Jan 6, 2023

julia> f = ROOTFile("./test/samples/RNTuple/test_ntuple_stl_containers.root");

julia> rnt = LazyTree(f, "ntuple")
 Row │ string  vector_int32   array_float      vector_vector_i  vector_string   vector_vector_s  varia 
     │ String  Vector{Int32}  StaticArraysCor  Vector{Vector{I  Vector{String}  Vector{Vector{S  Union 
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────
 1   │ one     [1]            [1.0, 1.0, 1.    Vector{Int32}    ["one"]         [["one"]]        1     
 2   │ two     [1, 2]         [2.0, 2.0, 2.    Vector{Int32}    ["one", "two"   [["one"], ["t    two   ⋯
 3   │ three   [1, 2, 3]      [3.0, 3.0, 3.    Vector{Int32}    ["one", "two"   [["one"], ["t    three 
 4   │ four    [1, 2, 3, 4]   [4.0, 4.0, 4.    Vector{Int32}    ["one", "two"   [["one"], ["t    4     ⋯
 5   │ five    [1, 2, 3, 4,   [5.0, 5.0, 5.    Vector{Int32}    ["one", "two"   [["one"], ["t    5     
                                                                                       7 columns omitted


julia> a
"/tmp/jl_Sj6vmtAR0d"

julia> Arrow.write(a, rnt)
"/tmp/jl_Sj6vmtAR0d"

julia> Arrow.Table(a)
Arrow.Table with 5 rows, 13 columns, and schema:
 :string                       String
 :vector_int32                 Vector{Int32} (alias for Array{Int32, 1})
 :array_float                  Vector{Float32} (alias for Array{Float32, 1})
 :vector_vector_int32          Vector{Vector{Int32}} (alias for Array{Array{Int32, 1}, 1})
 :vector_string                Vector{String} (alias for Array{String, 1})
 :vector_vector_string         Vector{Vector{String}} (alias for Array{Array{String, 1}, 1})
 :variant_int32_string         Union{Missing, Int32, String}
 :vector_variant_int64_string  Vector{Union{Missing, Int64, String}} (alias for Array{Union{Missing, Int64, String}, 1})
 :tuple_int32_string           NamedTuple{(:_0, :_1), Tuple{Int32, String}}
 :pair_int32_string            NamedTuple{(:_0, :_1), Tuple{Int32, String}}
 :vector_tuple_int32_string    Vector{NamedTuple{(:_0, :_1), Tuple{Int32, String}}} (alias for Array{NamedTuple{(:_0, :_1), Tuple{Int32, String}}, 1})
 :lorentz_vector               NamedTuple{(:pt, :eta, :phi, :mass), NTuple{4, Float32}}
 :array_lv                     Vector{NamedTuple{(:pt, :eta, :phi, :mass), NTuple{4, Float32}}} (alias for Array{NamedTuple{(:pt, :eta, :phi, :mass), NTuple{4, Float32}}, 1})

julia> DataFrame(Arrow.Table(a))
5×13 DataFrame
 Row │ string  vector_int32          array_float             vector_vector_int32                vector 
     │ String  Array                Array                  Array                             Array 
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ one     Int32[1]              Float32[1.0, 1.0, 1.0]  Vector{Int32}[[1]]                 ["one" 
   2 │ two     Int32[1, 2]           Float32[2.0, 2.0, 2.0]  Vector{Int32}[[1], [2]]            ["one"
   3 │ three   Int32[1, 2, 3]        Float32[3.0, 3.0, 3.0]  Vector{Int32}[[1], [2], [3]]       ["one"
   4 │ four    Int32[1, 2, 3, 4]     Float32[4.0, 4.0, 4.0]  Vector{Int32}[[1], [2], [3], [4]]  ["one"
   5 │ five    Int32[1, 2, 3, 4, 5]  Float32[5.0, 5.0, 5.0]  Vector{Int32}[[1], [2], [3], [4]  ["one" 

src/RNTuple/highlevel.jl Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 6, 2023

Codecov Report

Base: 89.97% // Head: 89.86% // Decreases project coverage by -0.11% ⚠️

Coverage data is based on head (88deed6) compared to base (3afcd0f).
Patch coverage: 66.66% of modified lines in pull request are covered.

❗ Current head 88deed6 differs from pull request most recent head dd2f334. Consider uploading reports for the commit dd2f334 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #202      +/-   ##
==========================================
- Coverage   89.97%   89.86%   -0.12%     
==========================================
  Files          18       18              
  Lines        2074     2081       +7     
==========================================
+ Hits         1866     1870       +4     
- Misses        208      211       +3     
Impacted Files Coverage Δ
src/RNTuple/highlevel.jl 91.04% <62.50%> (-3.96%) ⬇️
src/RNTuple/footer.jl 91.11% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Moelf Moelf requested a review from tamasgal January 6, 2023 21:03
@Moelf Moelf changed the title implement Table.partition for RNTuple LazyTree implement Table.partition and test Arrow round-trips for LazyTree Jan 6, 2023
@Moelf
Copy link
Member Author

Moelf commented Jan 6, 2023

The Tables.schema(::LazyTree) we report is slightly different from what we get back from Arrow due to:

but otherwise it's sane and we report same schema as DataFrame

@tamasgal
Copy link
Member

tamasgal commented Jan 6, 2023

Ah interesting, good to know ;) Not sure if it's a bug or feature, I am not very familiar with Arrow but it's on my list for 2023 (want to get rid of a few custom HDF5 layouts).

@Moelf Moelf merged commit 4c9489c into master Jan 6, 2023
@Moelf Moelf deleted the table_partition_rntuple branch January 6, 2023 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants