-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pool sharing with copy on write #56
Conversation
Can you have a quick look if now it looks as a right direction? (I will then polish the PR) |
Codecov Report
@@ Coverage Diff @@
## main #56 +/- ##
==========================================
+ Coverage 79.41% 85.14% +5.72%
==========================================
Files 1 1
Lines 204 249 +45
==========================================
+ Hits 162 212 +50
+ Misses 42 37 -5
Continue to review full report at Codecov.
|
Benchmarks are as follows: This PR:
|
I need to add tests, but other than that this should be good for a review. |
@nalimilan + @quinnj : do you think we should add a separate branch when we detect that Julia is run in a single thread (a quite common case I think) and then we can just ignore everything and never copy |
Yes, looks quite good!
Good question. That makes the code harder to test so it's risky. Maybe better wait and see whether this would really matter in practice. |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
OK. Then I will leave it as is for now. |
I will add tests for this functionality later as I have to think how to make them accurate since it is not easy. What version of PooledArrays.jl this PR should introduce? |
src/PooledArrays.jl
Outdated
Base.convert(::Type{PooledArray{S,R,N}}, pa::PooledArray{T,R,N}) where {S,T,R<:Integer,N} = | ||
PooledArray(RefArray(copy(pa.refs)), convert(Dict{S,R}, pa.invpool)) | ||
function Base.convert(::Type{PooledArray{S,R1,N}}, pa::PooledArray{T,R2,N}) where {S,T,R1<:Integer,R2<:Integer,N} | ||
if S === T && R1 === R2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not covered as it should never be true (we have a separate method for this - should I change this to @assert
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just remove the branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - removed
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Base.@propagate_inbounds function Base.isassigned(pa::PooledArray, I::Int...) | ||
!iszero(pa.refs[I...]) | ||
if VERSION < v"1.1" | ||
Base.@propagate_inbounds function Base.getindex(A::SubArray{T,D,P,I,true} , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really untested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think coverage is not run on Julia 1.0 and this is the reason. Without this definition tests were failing on Julia 1.0 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line https://github.com/JuliaData/PooledArrays.jl/pull/56/files#r584517703 fails without this special method
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
I will merge it tomorrow if there are no more comments and tag a release. |
@test refcount(pat1)[] == 3 | ||
|
||
copy!(pat1, pav1) | ||
@test pat1 == pav1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan - this line fails without the special method for Julia 1.0:
julia> pa = PooledArray(1:4)
4-element PooledArray{Int64,UInt32,1,Array{UInt32,1}}:
1
2
3
4
julia> pav1 = @view pa[2:3]
2-element view(::PooledArray{Int64,UInt32,1,Array{UInt32,1}}, 2:3) with eltype Int64:
2
3
julia> pav1[1]
ERROR: MethodError: getindex(::SubArray{Int64,1,PooledArray{Int64,UInt32,1,Array{UInt32,1}},Tuple{UnitRange{Int64}},true}, ::Int64) is ambiguous. Candidates:
getindex(A::SubArray{#s22,N,#s20,I,L} where L where I where #s20<:PooledArray where #s22, I::Vararg{Int64,N}) where {T, N} in PooledArrays at /home/bkamins/.julia/dev/PooledArrays/src/PooledArrays.jl:476
getindex(V::SubArray{T,N,P,I,true} where I<:Tuple{AbstractUnitRange,Vararg{Any,N} where N} where P where N where T, i::Int64) in Base at subarray.jl:234
Possible fix, define
getindex(::SubArray{T,1,P,I,true} where I<:Tuple{AbstractUnitRange,Vararg{Any,N} where N} where P<:PooledArray where T, ::Int64)
Stacktrace:
[1] top-level scope at none:0
Thank you! Today I will test it in DataFrames.jl new joining code after tagging |
Upgrading PooledArrays for JuliaDB/Indexed Tables from v.0.5.2 cause 10x slow down in There is no way to switch it off? |
Can you please open an issue with a reproducible code showing the problem? |
there is so much going in when you do a groupreduce in IndexTables - I will try to reduce this to a PooledArray operation |
Also if you can confirm that the regression appeared between say 0.5.1 and 0.5.2 it would be useful to track down the problem. Do make sure that other packages use the same version. |
@cwiese - you can edit appropriate methods in PooledArrays.jl that do pool copying to print some debugging information to see when/if things are invoked. |
Okay there is something else happening here - I will open an issue if I can narrow this down to PooledArrays. Thanks! |
Fixes #54
Could you please have a look at the core of the design proposal. If we are OK with it I will reimplement the rest, add tests, and update documentation.