Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock-free ROOTFile.fobj #153

Closed
Moelf opened this issue Feb 25, 2022 · 1 comment
Closed

Lock-free ROOTFile.fobj #153

Moelf opened this issue Feb 25, 2022 · 1 comment

Comments

@Moelf
Copy link
Member

Moelf commented Feb 25, 2022

image

there's a super free optimization that is especially effectively for network I/O:

First:

diff --git a/src/iteration.jl b/src/iteration.jl
index 04e5d7d..7e30835 100644
--- a/src/iteration.jl
+++ b/src/iteration.jl
@@ -333,7 +333,7 @@ function Base.getindex(ba::LazyBranch{T,J,B}, range::UnitRange) where {T,J,B}
     ib2 = findfirst(x -> x > (last(range) - 1), ba.fEntry) - 1
     offset = ba.fEntry[ib1]
     range = (first(range)-offset):(last(range)-offset)
-    return vcat([basketarray(ba, i) for i in ib1:ib2]...)[range]
+    return Vcat(asyncmap(i->basketarray(ba, i), ib1:ib2)...)[range]
 end

but to make this useful, we also need:

diff --git a/src/root.jl b/src/root.jl
index e4eb852..abb1c09 100644
--- a/src/root.jl
+++ b/src/root.jl
@@ -472,17 +472,17 @@ function readbasket(f::ROOTFile, branch, ith)
 end
 
 function readbasketseek(f::ROOTFile, branch::Union{TBranch, TBranchElement}, seek_pos::Int, nb)
-    lock(f)
-    local basketkey, compressedbytes
+    # lock(f)
+    local rawbuffer
     try
         seek(f.fobj, seek_pos)
         rawbuffer = OffsetBuffer(IOBuffer(read(f.fobj, nb)), seek_pos)
-        basketkey = unpack(rawbuffer, TBasketKey)
-        compressedbytes = compressed_datastream(rawbuffer, basketkey)
     catch
         finally
-        unlock(f)
+        # unlock(f)
     end
+    basketkey = unpack(rawbuffer, TBasketKey)
+    compressedbytes = compressed_datastream(rawbuffer, basketkey)

Now, the problem of course is that removing the lock will fuck up our multi-threaded reading, because the current cursor location is stored in IOStream. Technically, network based ones don't need a position() since each read is bytes range based, but our on-disk IOStream will surely complain.

  1. One possibility is to use MMap based for on-disk files
  2. or we can "duplicate" cursor object for each file for each thread...
@Moelf
Copy link
Member Author

Moelf commented Feb 25, 2022

#150 attempted to add a MmapStream

@Moelf Moelf closed this as completed Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant