basket cluster iteration utilities #118
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related to #114
Adding some utilities to deal with cluster iteration. One can give
clusterranges
either aLazyTree
or an array ofLazyBranch
and it will return ranges wherefBasketEntry
lines up across all the input branches. That's the most naive thing I could think of given that thefClusterRangeEnd
is almost always empty. uproot4 does the same thing. Note below that there are 398 baskets, but only 75 clusters.The idea would be to then have
There's also a function to get the number of bytes per cluster. The logic is there and can be modified based on what we want to actually do with this information (combine clusters until we reach
x
MB?). Note that the cluster sizes are around 30 MB. IIrc, there's some 30MB default AutoFlush parameter for TTrees somewhere, so that's a promising sign.For reading into memory instead of piping back to disk,
compressed=false
may be a better metric.