Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] resurrecting in place ntoh #112

Closed
wants to merge 4 commits into from
Closed

Conversation

aminnj
Copy link
Member

@aminnj aminnj commented Sep 18, 2021

Resurrection of #101 (which means some copy-pasting from @Moelf ;) ). Basically, we add a inplace::Bool to interped_data which toggles between doing ntoh.(reinterpret(...)) (copy) or an inplace-variant, both returning the same type which means it's still type stable.

But with a small modification, basketarray()'s return type depends on the boolean. Probably this can be fixed with a barrier function to discard the Ref{UInt8[]} if called with inplace=false (the default.

Unit tests all pass.

julia> using UnROOT

julia> const t = LazyTree(ROOTFile("uncompressed_Run2012BC_DoubleMuParked_Muons.root"),"Events");

julia> function bar(t)
           for evt in t
               _ = evt.Muon_pt
           end
           nothing
       end

before

julia> @btime sum(t.nMuon) seconds=10
  204.417 ms (6078 allocations: 469.98 MiB)
0x0000000008e67ad8

julia> @btime bar(t) seconds=10
  712.140 ms (14304 allocations: 1.82 GiB)

after

julia> @btime sum(t.nMuon) seconds=10
  205.316 ms (6780 allocations: 235.24 MiB)
0x0000000008e67ad8

julia> @btime bar(t) seconds=10
  725.626 ms (16241 allocations: 1.26 GiB)

Performance is pretty much the same but with less total allocated memory. If I profile bar(t) I see there's a materialization. I thought this should be in place??

image

@codecov

This comment has been minimized.

@Moelf
Copy link
Member

Moelf commented Sep 18, 2021

I know it probably doesn't get much better but this is very messy :(

@Moelf
Copy link
Member

Moelf commented Sep 18, 2021

actually:

#before
(14304 allocations: 1.82 GiB)
#after
(16241 allocations: 1.26 GiB)

I think this means we shouldn't do this, the number of allocation increases probably caused slow down. And the difference isn't as big as the 2x in the nMuon case

@aminnj
Copy link
Member Author

aminnj commented Sep 18, 2021

I agree it’s not pretty. I’ll keep this an open wip for now since at least it works. If we get inspiration for making it better and truly in place (?) that would be great :)

@tamasgal
Copy link
Member

I also think that this needs a bit more time ;) let's keep this floating around...

@aminnj
Copy link
Member Author

aminnj commented Oct 18, 2021

Brainstorming a bit more. What about a thin wrapper type? Instead of returning a Ref{UInt8} everywhere and storing it in the LazyBranch to prevent the underlying data from unsafe_wrap from being GCed, we could define

struct Wrapper{T} <: AbstractVector{T}
           x::Vector{T}
           ref::Ref{Vector{UInt8}}
       end
Base.@propagate_inbounds Base.getindex(w::Wrapper{T}, ind::Int) where T = w.x[ind]
Base.size(w::Wrapper) = size(w.x)

And then with minimal changes, we eliminate the materialization associated with reinterpret.

     function LazyBranch(f::ROOTFile, b::Union{TBranch,TBranchElement})
         T, J = auto_T_JaggT(f, b; customstructs=f.customstructs)
         T = (T === Vector{Bool} ? BitVector : T)
-        _buffer = T[]
+        _buffer = Wrapper{eltype(T)}([], Ref(UInt8[]))
         if J != Nojagg
 function interped_data(rawdata, rawoffsets, ::Type{T}, ::Type{J}) where {T, J<:JaggType}
     if J === Nojagg
-        return ntoh.(reinterpret(T, rawdata))
+        p = convert(Ptr{eltype(T)}, pointer(rawdata))
+        w = unsafe_wrap(Array, p, length(rawdata) ÷ sizeof(eltype(T)))
+        w .= ntoh.(w)
+        return Wrapper(w, Ref(rawdata))
julia> @btime sum(tf.nMuon) # before
  306.325 ms (1794 allocations: 469.67 MiB)
julia> @btime sum(tf.nMuon) # after
  206.946 ms (1716 allocations: 234.91 MiB)

I think it's not too hard to generalize to VoV since it can wrap any AbstractVector. I.e.,

julia> using ArraysOfArrays
julia> Wrapper([1,2,3,4],Ref(UInt8[]));
julia> VectorOfVectors(w, [1,2,5])
2-element VectorOfVectors{Int64, UnROOT.Wrapper{Int64}, Vector{Int64}, Vector{Tuple{}}}:
 [1]
 [2, 3, 4]

@Moelf
Copy link
Member

Moelf commented Dec 2, 2021

I wonder if @oschulz has any opinion. In fact, it would be nice to have a variation of VectorOfVector that has a field for Ref to hold the original byte array...

@Moelf
Copy link
Member

Moelf commented Dec 2, 2021

@oschulz
Copy link
Member

oschulz commented Dec 2, 2021

I'm not sure I've understood all the implications here - are you looking for somethings like a VectorOfVectors wrapped around an UnsafeArray?

@Moelf
Copy link
Member

Moelf commented Dec 2, 2021

the problem is we have an original byte array A::Vector{UInt8}, we want to wrap VoV around it except we want to "cast" it to, for example, B::Vector{Int32}, but if you cast it by unsafe_wrap(), GC thinks A is not needed and thus GC-ed and you get bad data in VoV(B).

@oschulz
Copy link
Member

oschulz commented Dec 2, 2021

But why can't we use ReinterpretArray if we have to keep the reference to A around anyway?

@Moelf
Copy link
Member

Moelf commented Dec 2, 2021

because ReinterpretArray is slow for both:

  1. A .= ntoh.(A) (10x slower)
  2. everything user will later on do, such as looping over the branch
  3. just generally ugly/annoying to deal with ReinterpretArray considering we already have many wrappers

@oschulz
Copy link
Member

oschulz commented Dec 3, 2021

A .= ntoh.(A) (10x slower)

I wonder why ReinterpretArray is performing that much worse, compared to our custom wrappers. Well, probably can't be helped right now.

In that case I would recommend to use a VectorOfVectors around a custom wrapper - would that be workable? I don't think creating a special VoV type would increase performance, and it would generate more code to maintain long term.

@Moelf
Copy link
Member

Moelf commented Feb 12, 2022

JuliaLang/julia#42227 (comment)

on Julia master, y .= ntoh.(y) is no longer slow given y is an ReinterpretedArray

@Moelf
Copy link
Member

Moelf commented Feb 12, 2022

image

sadly despite the reduced allocation, it doesn't seem to improve timing by a lot in microbenchmark

@aminnj
Copy link
Member Author

aminnj commented Feb 13, 2022

It might be that zlib decompression + the calculation is washing out the true benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants