Skip to content

Commit

Permalink
make zip for >2 arguments about 20x faster
Browse files Browse the repository at this point in the history
I realized the only difference between Zip2 and general Zip should be
a single `...` token.
  • Loading branch information
JeffBezanson committed Feb 12, 2015
1 parent 34dbb0f commit eaf5a95
Showing 1 changed file with 18 additions and 28 deletions.
46 changes: 18 additions & 28 deletions base/iterator.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,39 +19,16 @@ eltype{I}(::Type{Enumerate{I}}) = (Int, eltype(I))

# zip

immutable Zip{I<:Tuple}
itrs::I
Zip(itrs) = new(itrs)
end
_mkZip{I}(itrs::I) = Zip{I}(itrs)
Zip(itrs...) = _mkZip(itrs)
zip(itrs...) = _mkZip(itrs)

length(z::Zip) = minimum(length, z.itrs)
start(z::Zip) = map(start, z.itrs)
function next(z::Zip, state)
n = map(next, z.itrs, state)
map(x->x[1], n), map(x->x[2], n)
end
done(z::Zip, state::()) = true
function done(z::Zip, state)
for i = 1:length(z.itrs)
if done(z.itrs[i], state[i])
return true
end
end
return false
end
abstract AbstractZipIterator

eltype{I}(::Type{Zip{I}}) = map(eltype, I)

immutable Zip2{I1, I2}
immutable Zip2{I1, I2} <: AbstractZipIterator
a::I1
b::I2
end
zip(a) = a
zip(a, b) = Zip2(a, b)

length(z::Zip2) = min(length(z.a), length(z.b))
eltype{I1,I2}(::Type{Zip2{I1,I2}}) = (eltype(I1), eltype(I2))
start(z::Zip2) = (start(z.a), start(z.b))
function next(z::Zip2, st)
n1 = next(z.a,st[1])
Expand All @@ -60,7 +37,20 @@ function next(z::Zip2, st)
end
done(z::Zip2, st) = done(z.a,st[1]) | done(z.b,st[2])

eltype{I1,I2}(::Type{Zip2{I1,I2}}) = (eltype(I1), eltype(I2))
immutable Zip{I, Z<:AbstractZipIterator} <: AbstractZipIterator
a::I
z::Z
end
zip(a, b, c...) = Zip(a, zip(b, c...))
length(z::Zip) = min(length(z.a), length(z.z))
eltype{I,Z}(::Type{Zip{I,Z}}) = tuple(eltype(I), eltype(Z)...)
start(z::Zip) = tuple(start(z.a), start(z.z))
function next(z::Zip, st)
n1 = next(z.a, st[1])
n2 = next(z.z, st[2])
(tuple(n1[1], n2[1]...), (n1[2], n2[2]))
end
done(z::Zip, st) = done(z.a,st[1]) | done(z.z,st[2])

# filter

Expand Down

8 comments on commit eaf5a95

@StefanKarpinski
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really still need Zip2 after this?

@JeffBezanson
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, they differ by a ..., so as far as I can tell, yes. It's also kind of reasonable to only have Zip2, and do for (a,(b,c)) in zip(x,zip(y,z)), but zip(x,y,z) is now a bit faster than that.

Zip2 is also currently still much faster than Zip; it usually gets fully inlined.

@timholy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeff, since you're working on iterators: do you understand why zip and enumerate are quite a lot slower than writing the same operation out manually?

References:
http://stackoverflow.com/questions/27577162/is-it-good-practice-to-use-counters/27578686#27578686
#9080

@JeffBezanson
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first guess would be to try adding @inline to Enumerate's next method.

@timholy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but no dice. Still 2x slower.

@JeffBezanson
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting! Looking at the generated IR, it looks like LLVM was not able to fully scalar-convert the tuples that Enumerate uses. There is some extra unnecessary shuffling to form and unpack 2-element vectors using vpextrq/vpunpcklqdq.

@bramtayl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at this and wondering why the original implementation was slow, given #26765. Is it the type instability in done? Could be solved by using an recursive version of all.

@bramtayl
Copy link
Contributor

@bramtayl bramtayl commented on eaf5a95 May 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

reduce_unrolled(reducer, default, args::Tuple{}) = default
reduce_unrolled(reducer, default, args) =
    reducer(default, reduce_unrolled(reducer, first(args), tail(args)))

done(z::Zip, state) = reduce_unrolled(&, false, map(done, z.itrs, z.state))

Please sign in to comment.