-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added unique!(f, itr) function #30141
Conversation
base/set.jl
Outdated
end | ||
resize!(A, count) | ||
end | ||
|
||
# If A is not grouped, then we will need to keep track of all of the elements that we have | ||
# seen so far. | ||
function _unique!(A::AbstractVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be replaces now with
_unique!(A::AbstractVector) = unique(identity, A)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oxinabox Could you please explain it a little bit as this is my first contribution to Julia Lang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was being overly terse (unusual for me :-P)
Once you have defined a function unique!(f, A)
, like you are defining.
The function _unique!(A)
here,
can be simply defined as call to unique!(identity, A)
.
identity
is a function in Julia that does x->x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for explaining it @oxinabox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified _unique according to the need
base/set.jl
Outdated
``` | ||
""" | ||
function unique!(f, A::AbstractVector) | ||
seen = Set{eltype(A)}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not going to work our so well.
Consisder:
Current behavour of unique
julia> unique(x-> x % 2 ==0 ? :even : :odd, [1,2,3,4,2,2,1])
2-element Array{Int64,1}:
1
2
vs for the this implementation of unique!
julia> unique!(x-> x % 2 ==0 ? :even : :odd, [1,2,3,4,2,2,1])
ERROR: MethodError: Cannot `convert` an object of type Symbol to an object of type Int64
Closest candidates are:
convert(::Type{T<:Number}, ::T<:Number) where T<:Number at number.jl:6
convert(::Type{T<:Number}, ::Number) where T<:Number at number.jl:7
convert(::Type{T<:Integer}, ::Ptr) where T<:Integer at pointer.jl:23
...
Stacktrace:
[1] setindex!(::Dict{Int64,Nothing}, ::Nothing, ::Symbol) at ./dict.jl:373
[2] push!(::Set{Int64}, ::Symbol) at ./set.jl:48
[3] new_unique!(::getfield(Main, Symbol("##3#4")), ::Array{Int64,1}) at ./REPL[8]:9
[4] top-level scope at none:0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the necessary changes.Please comment if further changes required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine now.
I feel like there is an optimisation that could be done to remove the Set{Any}()
(like in map
)
but that it is not required in this PR, since this works the same way as unique
(and it might not be required at all, benchmarking should be done)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ultimately this is fine by now isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK it is fine,
but you'll have to wait for someone with merge permissions to do a real review/merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why travis-ci build test is failing repeatedly.Any advice would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea, thanks for the contribution. Unfortunately we'll have to fix this up a bit to reach optimal efficiency, but we'll get there :)
-
Avoid creating aSet{Any}()
``` | ||
""" | ||
function unique!(f, A::AbstractVector) | ||
seen = Set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this won't be efficient because this means Set{Any}()
.
Since relying on inference is naughty, what should probably happen here is a widening pattern. Start with a Set{typeof(first(A))}()
and then keep pushing to this set until the element type isn't wide enough. Someone else might know better, but I think the most efficient pattern is then to dispatch again to more-or-less the same function. This is, unfortunately, a little bit of work to implement, sorry.
You'll want an inner function like _unique!(f, A, seen, i)
where seen
is the current Set
and i
is the current index The outer unique!
function is creates the initial Set
and dispatches to the inner function.
i.e. something like
function unique!(f, A::AbstractArray)
if length(A) <= 1
return A
end
i = firstindex(A)
t = @inbounds A[i]
seen = Set{typeof(x)}()
push!(seen, t)
return _unique!(f, A, seen, 1, i+1)
end
and the other funciton approximately
function _unique!(f, A::AbstractArray, seen::Set, count::Integer, i::Integer)
while i <= lastindex(A)
t = @inbounds A[i]
if t ∉ seen
count += 1
@inbounds A[count] = t
if typeof(t) <: eltype(seen)
push!(seen, t)
else
seen2 = convert(Set{promote_typejoin(eltype(seen), typeof(t))}, seen)
push!(seen2, t)
return _unique!(f, A, seen2, count, i+1)
end
end
i += 1
end
return resize!(A, count)
end
I haven't tested the above at all, but I hope it demonstrates the pattern at least. The compiler will elide all the complexity in the standard case that f(first(A))
is inferrable. Also note that I've assumed all AbstractVector
s have AbstractUnitRange
indices in the above to avoid the need to use iterate
, but the other pattern is fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree,
I also think that can be in a seperate PR.
Right now, the existing unique(f, iter)
(no bang),
does the same thing.
And can be optimised in the same way.
This PR adds a naive implementation that can be improved later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this could be implemented in a different PR as both with and without bang version of unique(f, iter) uses naive method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyferris I can try to implement the inner function _unique within this PR if this is what is needed to merge it but will need some help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I didn’t mean to block this - iterating later is of course fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyferris you might want to dismiss your review,
However that is done,
(possibly by rereviewing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyferris Please re review this PR.
@@ -196,6 +221,10 @@ function _unique!(A::AbstractVector) | |||
resize!(A, count) | |||
end | |||
|
|||
# If A is not grouped, then we will need to keep track of all of the elements that we have | |||
# seen so far. | |||
_unique!(A::AbstractVector) = unique!(identity, A::AbstractVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the call to unique!
here will now use a Set{Any}
this will decrease regular unique!(f,A)
s performance, and I think Andy's objection should be fixed in this PR, since this PR not only adding a new feature that can be optimized later, it makes performance of existing code worse. It would at least be good if you could present some benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will try to build the inner _unique function as suggested by @andyferris.In the meantime I will benchmark the existing solution and will present it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fredrikekre When I run @btime
unique!(x->2x, 1:10_000); it says
setindex! not defined for UnitRange{Int64}
I couldn’t understand what the error message says?
Same problem occurs with the @benchmark
macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because 1:10_000
is an immutable array, try instead @btime unique!(x->2x, $([1:10_000;]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some benchmarking results for unique!(f, iter) and unique!(iter) are stated:
@benchmark unique!(x->x^2, $([1:10_000;]))
memory estimate: 786.97 KiB
allocs estimate: 37981
--------------
minimum time: 995.317 μs (0.00% GC)
median time: 1.050 ms (0.00% GC)
mean time: 1.159 ms (3.08% GC)
maximum time: 38.007 ms (96.42% GC)
--------------
samples: 4305
evals/sample: 1
@benchmark unique!($([1:10_000;]))
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 38.053 μs (0.00% GC)
median time: 38.081 μs (0.00% GC)
mean time: 41.537 μs (0.00% GC)
maximum time: 10.190 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I lead you down the wrong path with my coimment on #30141 (comment)
Replacing _unique!(iter)
with unique!(identity, iter)
would only have (certainly) been optimised away if the Set{Any}
had not been needed.
Can you also report benchmarks for julia master for the old performance of unique!($([1:10_000;]))
, without this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a branch, ajf/faster_unique_f
. Completely untested, but I'll finish it up.
@fredrikekre I was thinking it would be better to have small PRs with seperate concerns, as Lyndon suggests. Let's simply merge this feature PR from @raghav9-97 (his first Julia PR if I understand) and I'll follow up straight away with the performance PR that addresses both unique
and unique!
. Are you OK with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yeah, this particular line does leave us in an bad intermediate state... oh well.
Should have been squashed. |
Oh... is this the general policy? |
This PR had a couple of unrelated things like a test by me in one commit etc, and generally when there are "fix-up" commits it is nice to squash IMO, since the intermediate commits are useless. |
Thanks for guiding me through my very first contribution in Julia @fredrikekre @oxinabox @andyferris @rfourquet. I know there were some mistakes in this PR which I would work upon in the future PRs. |
Compat annotation for rank(A; rtol=..., atol=...), #29926.
Added a naive implemetation of unique!(f, itr).Fixes #28415
Please comment for further amendments.