Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basket cluster iteration utilities #118

Merged
merged 3 commits into from
Sep 28, 2021

Conversation

aminnj
Copy link
Member

@aminnj aminnj commented Sep 28, 2021

Related to #114

Adding some utilities to deal with cluster iteration. One can give clusterranges either a LazyTree or an array of LazyBranch and it will return ranges where fBasketEntry lines up across all the input branches. That's the most naive thing I could think of given that the fClusterRangeEnd is almost always empty. uproot4 does the same thing. Note below that there are 398 baskets, but only 75 clusters.

julia> using UnROOT ; const t = LazyTree(ROOTFile("Run2012BC_DoubleMuParked_Muons.root"),"Events");

julia> UnROOT.numbaskets(t.nMuon.b)
398

julia> UnROOT._clusterranges([t.nMuon, t.Muon_pt])
75-element Vector{UnitRange{Int64}}:
 1:821695
 821696:1643390
 
 59983736:60805430
 60805431:61540413

julia> UnROOT._clusterranges(t)
75-element Vector{UnitRange{Int64}}:
 1:821695
 821696:1643390
 
 59983736:60805430
 60805431:61540413

The idea would be to then have

# until the @async issue is fixed in `Arrow.write()`
function _lockedget(t::LazyTree, r::UnitRange)
    f = getproperty(t,first(propertynames(t))).f
    lock(f)
    try
        return t[r]
    catch
    finally
        unlock(f)
    end
end
Tables.partitions(t::LazyTree) = (_lockedget(t, r) for r in _clusterranges(t))

There's also a function to get the number of bytes per cluster. The logic is there and can be modified based on what we want to actually do with this information (combine clusters until we reach x MB?). Note that the cluster sizes are around 30 MB. IIrc, there's some 30MB default AutoFlush parameter for TTrees somewhere, so that's a promising sign.

julia> UnROOT._clusterbytes(t; compressed=true) ./ 1024^2
75-element Vector{Float64}:
 28.633505821228027
 27.9722261428833
  
 29.023967742919922
 25.91922950744629

For reading into memory instead of piping back to disk, compressed=false may be a better metric.

julia> UnROOT._clusterbytes(t; compressed=false) ./ 1024^2
75-element Vector{Float64}:
 58.74042271086661
 55.89151628176583
  
 57.50102683642917
 51.386066935034215

@codecov
Copy link

codecov bot commented Sep 28, 2021

Codecov Report

Merging #118 (987d7e2) into master (db89380) will increase coverage by 2.26%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #118      +/-   ##
==========================================
+ Coverage   90.40%   92.67%   +2.26%     
==========================================
  Files          11       11              
  Lines        1355     1378      +23     
==========================================
+ Hits         1225     1277      +52     
+ Misses        130      101      -29     
Impacted Files Coverage Δ
src/UnROOT.jl 100.00% <ø> (ø)
src/iteration.jl 90.44% <100.00%> (+1.47%) ⬆️
src/io.jl 89.39% <0.00%> (-1.52%) ⬇️
src/utils.jl 100.00% <0.00%> (ø)
src/types.jl 92.20% <0.00%> (+0.10%) ⬆️
src/bootstrap.jl 92.30% <0.00%> (+8.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db89380...987d7e2. Read the comment docs.

@aminnj aminnj marked this pull request as ready for review September 28, 2021 06:40
@tamasgal
Copy link
Member

Rampage 😆

Looks good to me!

@tamasgal tamasgal merged commit 6456d7b into JuliaHEP:master Sep 28, 2021
@Moelf
Copy link
Member

Moelf commented Feb 24, 2022

it turns out that common_entry_offsets is not used anymore and uproot4 just don't care where baskets are:

  1. when iterating over chunks, if the next chunk happens to include some of the already-read baskets, they will re-use it
  2. other wise they cut off baskets wherever (according to default or user-specified chunk size).

Moelf pushed a commit to aminnj/UnROOT.jl that referenced this pull request Jun 23, 2022
* add some functions

* add tests

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants