Added unique!(f, itr) function #30141

raghav9-97 · 2018-11-24T15:54:21Z

Added a naive implemetation of unique!(f, itr).Fixes #28415
Please comment for further amendments.

oxinabox · 2018-11-24T20:34:20Z

base/set.jl

+    end
+    resize!(A, count)
+end
+
 # If A is not grouped, then we will need to keep track of all of the elements that we have
 # seen so far.
 function _unique!(A::AbstractVector)


This can be replaces now with

_unique!(A::AbstractVector) = unique(identity, A)

@oxinabox Could you please explain it a little bit as this is my first contribution to Julia Lang?

Sorry, I was being overly terse (unusual for me :-P)

Once you have defined a function unique!(f, A), like you are defining.
The function _unique!(A) here,
can be simply defined as call to unique!(identity, A).

identity is a function in Julia that does x->x

Okay, thanks for explaining it @oxinabox

Modified _unique according to the need

oxinabox · 2018-11-24T20:41:54Z

base/set.jl

+```
+"""
+function unique!(f, A::AbstractVector)
+    seen = Set{eltype(A)}()


That is not going to work our so well.

Consisder:
Current behavour of unique

julia> unique(x-> x % 2 ==0 ? :even : :odd, [1,2,3,4,2,2,1]) 2-element Array{Int64,1}: 1 2

vs for the this implementation of unique!

julia> unique!(x-> x % 2 ==0 ? :even : :odd, [1,2,3,4,2,2,1]) ERROR: MethodError: Cannot `convert` an object of type Symbol to an object of type Int64 Closest candidates are: convert(::Type{T<:Number}, ::T<:Number) where T<:Number at number.jl:6 convert(::Type{T<:Number}, ::Number) where T<:Number at number.jl:7 convert(::Type{T<:Integer}, ::Ptr) where T<:Integer at pointer.jl:23 ... Stacktrace: [1] setindex!(::Dict{Int64,Nothing}, ::Nothing, ::Symbol) at ./dict.jl:373 [2] push!(::Set{Int64}, ::Symbol) at ./set.jl:48 [3] new_unique!(::getfield(Main, Symbol("##3#4")), ::Array{Int64,1}) at ./REPL[8]:9 [4] top-level scope at none:0

I have made the necessary changes.Please comment if further changes required.

I think this is fine now.
I feel like there is an optimisation that could be done to remove the Set{Any}() (like in map)
but that it is not required in this PR, since this works the same way as unique (and it might not be required at all, benchmarking should be done)

So ultimately this is fine by now isn't it?

AFAIK it is fine,
but you'll have to wait for someone with merge permissions to do a real review/merge

I don't know why travis-ci build test is failing repeatedly.Any advice would be helpful

andyferris

Nice idea, thanks for the contribution. Unfortunately we'll have to fix this up a bit to reach optimal efficiency, but we'll get there :)

~~Avoid creating a Set{Any}()~~

andyferris · 2018-11-26T10:19:05Z

base/set.jl

+```
+"""
+function unique!(f, A::AbstractVector)
+    seen = Set()


Unfortunately this won't be efficient because this means Set{Any}().

Since relying on inference is naughty, what should probably happen here is a widening pattern. Start with a Set{typeof(first(A))}() and then keep pushing to this set until the element type isn't wide enough. Someone else might know better, but I think the most efficient pattern is then to dispatch again to more-or-less the same function. This is, unfortunately, a little bit of work to implement, sorry.

You'll want an inner function like _unique!(f, A, seen, i) where seen is the current Set and i is the current index The outer unique! function is creates the initial Set and dispatches to the inner function.

i.e. something like

function unique!(f, A::AbstractArray) if length(A) <= 1 return A end i = firstindex(A) t = @inbounds A[i] seen = Set{typeof(x)}() push!(seen, t) return _unique!(f, A, seen, 1, i+1) end

and the other funciton approximately

function _unique!(f, A::AbstractArray, seen::Set, count::Integer, i::Integer) while i <= lastindex(A) t = @inbounds A[i] if t ∉ seen count += 1 @inbounds A[count] = t if typeof(t) <: eltype(seen) push!(seen, t) else seen2 = convert(Set{promote_typejoin(eltype(seen), typeof(t))}, seen) push!(seen2, t) return _unique!(f, A, seen2, count, i+1) end end i += 1 end return resize!(A, count) end

I haven't tested the above at all, but I hope it demonstrates the pattern at least. The compiler will elide all the complexity in the standard case that f(first(A)) is inferrable. Also note that I've assumed all AbstractVectors have AbstractUnitRange indices in the above to avoid the need to use iterate, but the other pattern is fine too.

While I agree,
I also think that can be in a seperate PR.
Right now, the existing unique(f, iter) (no bang),
does the same thing.
And can be optimised in the same way.

This PR adds a naive implementation that can be improved later.

I also think this could be implemented in a different PR as both with and without bang version of unique(f, iter) uses naive method.

@andyferris I can try to implement the inner function _unique within this PR if this is what is needed to merge it but will need some help.

Sorry - I didn’t mean to block this - iterating later is of course fine.

@andyferris you might want to dismiss your review,
However that is done,
(possibly by rereviewing?)

@andyferris Please re review this PR.

fredrikekre · 2018-12-05T13:05:54Z

base/set.jl

@@ -196,6 +221,10 @@ function _unique!(A::AbstractVector)
    resize!(A, count)
 end

+# If A is not grouped, then we will need to keep track of all of the elements that we have
+# seen so far.
+_unique!(A::AbstractVector) = unique!(identity, A::AbstractVector)


Since the call to unique! here will now use a Set{Any} this will decrease regular unique!(f,A)s performance, and I think Andy's objection should be fixed in this PR, since this PR not only adding a new feature that can be optimized later, it makes performance of existing code worse. It would at least be good if you could present some benchmarks.

Okay, I will try to build the inner _unique function as suggested by @andyferris.In the meantime I will benchmark the existing solution and will present it here.

@fredrikekre When I run @btime unique!(x->2x, 1:10_000); it says
setindex! not defined for UnitRange{Int64}
I couldn’t understand what the error message says?
Same problem occurs with the @benchmark macro.

It's because 1:10_000 is an immutable array, try instead @btime unique!(x->2x, $([1:10_000;]))

Some benchmarking results for unique!(f, iter) and unique!(iter) are stated:
@benchmark unique!(x->x^2, $([1:10_000;]))

memory estimate: 786.97 KiB allocs estimate: 37981 -------------- minimum time: 995.317 μs (0.00% GC) median time: 1.050 ms (0.00% GC) mean time: 1.159 ms (3.08% GC) maximum time: 38.007 ms (96.42% GC) -------------- samples: 4305 evals/sample: 1

@benchmark unique!($([1:10_000;]))

memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 38.053 μs (0.00% GC) median time: 38.081 μs (0.00% GC) mean time: 41.537 μs (0.00% GC) maximum time: 10.190 ms (0.00% GC) -------------- samples: 10000 evals/sample: 1

Sorry, I lead you down the wrong path with my coimment on #30141 (comment)

Replacing _unique!(iter) with unique!(identity, iter) would only have (certainly) been optimised away if the Set{Any} had not been needed.

Can you also report benchmarks for julia master for the old performance of unique!($([1:10_000;])), without this PR?

I have a branch, ajf/faster_unique_f. Completely untested, but I'll finish it up.

@fredrikekre I was thinking it would be better to have small PRs with seperate concerns, as Lyndon suggests. Let's simply merge this feature PR from @raghav9-97 (his first Julia PR if I understand) and I'll follow up straight away with the performance PR that addresses both unique and unique!. Are you OK with that?

But yeah, this particular line does leave us in an bad intermediate state... oh well.

fredrikekre · 2018-12-05T23:14:14Z

Should have been squashed.

andyferris · 2018-12-05T23:18:49Z

Should have been squashed.

Oh... is this the general policy?

fredrikekre · 2018-12-05T23:24:25Z

This PR had a couple of unrelated things like a test by me in one commit etc, and generally when there are "fix-up" commits it is nice to squash IMO, since the intermediate commits are useless.

raghav9-97 · 2018-12-06T06:05:09Z

Thanks for guiding me through my very first contribution in Julia @fredrikekre @oxinabox @andyferris @rfourquet. I know there were some mistakes in this PR which I would work upon in the future PRs.

Compat annotation for rank(A; rtol=..., atol=...), #29926.

* Compat annotation for unique!(f, A), #30141. Compat annotation for rank(A; rtol=..., atol=...), #29926. * Update stdlib/LinearAlgebra/src/generic.jl

* Compat annotation for unique!(f, A), #30141. Compat annotation for rank(A; rtol=..., atol=...), #29926. * Update stdlib/LinearAlgebra/src/generic.jl (cherry picked from commit 4fc446f)

fredrikekre and others added 2 commits November 24, 2018 21:08

Fix show test when Sockets are loaded in main. (#30119)

2799812

Merge branch 'master' of https://github.com/JuliaLang/julia

35eabac

oxinabox reviewed Nov 24, 2018

View reviewed changes

raghav9-97 added 3 commits November 25, 2018 14:03

Added naive unique!(f, itr) function

43bc5c3

Fixed Set type error

e562c06

Modified _unique!(A) function

c2535b3

raghav9-97 changed the title ~~RFC : Added naive unique!(f, itr) function~~ RFC : Added unique!(f, itr) function Nov 26, 2018

andyferris suggested changes Nov 26, 2018

View reviewed changes

andyferris approved these changes Dec 4, 2018

View reviewed changes

raghav9-97 changed the title ~~RFC : Added unique!(f, itr) function~~ Added unique!(f, itr) function Dec 5, 2018

fredrikekre reviewed Dec 5, 2018

View reviewed changes

andyferris merged commit 5cbbed3 into JuliaLang:master Dec 5, 2018

andyferris mentioned this pull request Dec 5, 2018

Make unique(f, itr) and unique!(f, itr) faster #30286

Merged

This was referenced Dec 5, 2018

[Issue 28415] Add unique!(f, itr) #28737

Closed

RFC: Implement unique consistently #29038

Closed

raghav9-97 deleted the solveissue branch December 6, 2018 10:57

fredrikekre added a commit that referenced this pull request Dec 7, 2018

Compat annotation for unique!(f, A), #30141.

182409b

Compat annotation for rank(A; rtol=..., atol=...), #29926.

fredrikekre mentioned this pull request Dec 7, 2018

Some more compat annotations #30297

Merged

ararslan pushed a commit that referenced this pull request Dec 7, 2018

Some more compat annotations (#30297)

4fc446f

* Compat annotation for unique!(f, A), #30141. Compat annotation for rank(A; rtol=..., atol=...), #29926. * Update stdlib/LinearAlgebra/src/generic.jl

christopher-dG mentioned this pull request Apr 20, 2020

Interactive mode JuliaCI/PkgTemplates.jl#145

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added unique!(f, itr) function #30141

Added unique!(f, itr) function #30141

raghav9-97 commented Nov 24, 2018 •

edited

Loading

oxinabox Nov 24, 2018

raghav9-97 Nov 25, 2018

oxinabox Nov 25, 2018

raghav9-97 Nov 25, 2018

raghav9-97 Nov 25, 2018

oxinabox Nov 24, 2018

raghav9-97 Nov 25, 2018 •

edited

Loading

oxinabox Nov 25, 2018

raghav9-97 Nov 25, 2018

oxinabox Nov 25, 2018

raghav9-97 Nov 25, 2018

andyferris left a comment •

edited

Loading

andyferris Nov 26, 2018 •

edited

Loading

oxinabox Nov 26, 2018

raghav9-97 Nov 29, 2018

raghav9-97 Dec 4, 2018

andyferris Dec 4, 2018

oxinabox Dec 4, 2018

raghav9-97 Dec 4, 2018

fredrikekre Dec 5, 2018

raghav9-97 Dec 5, 2018

raghav9-97 Dec 5, 2018 •

edited

Loading

rfourquet Dec 5, 2018

raghav9-97 Dec 5, 2018 •

edited

Loading

oxinabox Dec 5, 2018 •

edited

Loading

andyferris Dec 5, 2018

fredrikekre Dec 5, 2018

andyferris Dec 5, 2018

fredrikekre commented Dec 5, 2018

andyferris commented Dec 5, 2018

fredrikekre commented Dec 5, 2018

raghav9-97 commented Dec 6, 2018

Added unique!(f, itr) function #30141

Added unique!(f, itr) function #30141

Conversation

raghav9-97 commented Nov 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raghav9-97 Nov 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyferris left a comment • edited Loading

Choose a reason for hiding this comment

andyferris Nov 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raghav9-97 Dec 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raghav9-97 Dec 5, 2018 • edited Loading

Choose a reason for hiding this comment

oxinabox Dec 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fredrikekre commented Dec 5, 2018

andyferris commented Dec 5, 2018

fredrikekre commented Dec 5, 2018

raghav9-97 commented Dec 6, 2018

raghav9-97 commented Nov 24, 2018 •

edited

Loading

raghav9-97 Nov 25, 2018 •

edited

Loading

andyferris left a comment •

edited

Loading

andyferris Nov 26, 2018 •

edited

Loading

raghav9-97 Dec 5, 2018 •

edited

Loading

raghav9-97 Dec 5, 2018 •

edited

Loading

oxinabox Dec 5, 2018 •

edited

Loading