-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iteration protocol change #25261
Iteration protocol change #25261
Conversation
Another thing to discuss is whether the absence of
However, it has the distinct disadvantage that it can't carry any state. I didn't appreciate this aspect before (since it's sort of implicit in our current iteration protocol), but there are certain iterators that want to do some pre-processing at the beginning of an iteration, e.g. imagine:
which you can of course do just as above for the first argument, but I think it might validly be a problem for separation of concerns. More practical considerations might be something like #22467 (which may also be the case for the EachLineFile operator above depending how you implement it). On the other hand, there may be iterators for which constructing a valid, type-stable I think the best solution I can currently imagine is making the iteration protocol something like the following:
this has the advantage that it's file if the return type of We could even do something like:
to preserve the nice one liner above for simple iterators, but I worry about whether that'll be too complicated. |
Another possibility:
with
|
Leaving breadcrumbs to ec91f63 from PR #23642, where I had to introduce a Probably related: @davidanthoff discussed the case of iterators which need to allocate a resource on the first iteration, see #22466. |
I think the I hope this PR will stay open for a while? With the holidays it might be tricky to find the time needed to think about this... (I plan to eat, eat and eat the next couple of days, and do nothing else). |
I had a brief look and it seemed pretty reasonable to me. It might be interesting to try to implement the With regards to Thanks so much for doing this! |
I agree with @timholy it would make sense to use |
I have more changes locally, and I'm working on finishing up the PR without either of the proposed extensions (since they're extensions, it's fairly easy to do either in addition on top). However, having spent some more time thinking about this, I'm not sure I'm convinced that it is useful or necessary. It seems always possible to encode whatever information into the state object. I guess the fundamental question is what can you do with the state other than pass it back to iterate. For some iterators, we generally consider the ability to copy the state and start iterating at the same point again. However, whether that works for a given iterator is somewhat ill specified at this point. I'll keep thinking about this and finish up the PR in the mean time, but it's not clear to me. |
I agree that it seems like it should be able to bundle everything into |
On slack, @vtjnash and I came up with the following proposed lowering:
@vtjnash also proposes the following extension to handle resource cleanup:
There's a couple of things to like about this. With a default implementation of
this reduces to the original proposal in this PR for simple iterators. An implementation like,
could be used to give initial state if necessary (or that could still be put into the iterate function itself).
which I think nicely expresses the notion that such an iterator doesn't really have state. |
Slight revision from triage:
|
That seems to have a typo – what is
|
No, it's not typo. It's the first element of the tuple returned by iterator. |
I notice a small readability issue: previously we could write |
@@ -435,9 +424,9 @@ julia> collect(Iterators.rest([1,2,3,4], 2)) | |||
``` | |||
""" | |||
rest(itr,state) = Rest(itr,state) | |||
rest(itr) = itr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing since often the functions first
and rest
are synonyms for head
and tail
, i.e. rest(x)
drops the first element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to drop this method, though it is useful (see the use in _totuple).
Why not use a real struct type then? |
The ergonomics of not using a (iterate(x, state)::IterationPair) = (x[state], state + 1) Have multiple ways of accessing the return value ( |
If we want to recover / preserve the existing structure, we could provide something like: val, state = @iterate (x, state) || return But this is just an example of normal val, state = @nullable iterate(x, state) || return
val, state = iterate?(x, state) ||? return
etc., TBD, ... |
base/abstractarray.jl
Outdated
i, state = next(destiter, state) | ||
dest[i] = x | ||
y == nothing && | ||
throw(ArgumentError(string("source has fewer elements than required"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi line conditionals should probably use if
Also, why are you calling string on a String literal?
base/abstractarray.jl
Outdated
x2, state = next(a, state) | ||
done(a, state) && return hash(x2, hash(x1, h)) | ||
y1 = iterate(a) | ||
y1 == nothing && return h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
===
base/array.jl
Outdated
|
||
for acurr in a | ||
if f(acurr) | ||
a[i] = acurr | ||
i, state = next(idx, state) | ||
y = iterate(idx, state) | ||
y == nothing && (i += 1; break) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple statements should probably be on multiple lines, but in this case, just assert that the number of indexes and the number of elements are matched:
@assert (y::Tuple{Any, Any}) !== nothing
# promote_shape guarantees that A and B have the | ||
# same iteration space | ||
while ay !== nothing | ||
@inbounds r[ri] = ($f)(ay[1], by[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this annotation is on the right expression. We just want it on the r[ri]
, right?
base/dict.jl
Outdated
@@ -143,7 +143,7 @@ function Dict(kv) | |||
try | |||
dict_with_eltype((K, V) -> Dict{K, V}, kv, eltype(kv)) | |||
catch e | |||
if !applicable(start, kv) || !all(x->isa(x,Union{Tuple,Pair}),kv) | |||
if (!applicable(start, kv) && !applicable(iterate, kv)) || !all(x->isa(x,Union{Tuple,Pair}),kv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterate
is always applicable (fallback to calling start), so this will just have to be broken now perhaps (or improved to actually look at the error?)
base/repl/LineEdit.jl
Outdated
while true | ||
for c in completions | ||
(i > endof(c) || c[i] != cc) && return ret | ||
end | ||
ret = string(ret, cc) | ||
i >= endof(c1) && return ret | ||
i = nexti | ||
cc, nexti = next(c1, i) | ||
cc, nexti = c1, i+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i+1 isn’t a valid string index.
base/sysimg.jl
Outdated
@@ -443,7 +443,7 @@ init_load_path(ccall(:jl_get_julia_bindir, Any, ())) | |||
|
|||
INCLUDE_STATE = 3 # include = include_relative | |||
|
|||
import Base64 | |||
#import Base64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated?
base/sysimg.jl
Outdated
# @deprecate_binding Profile root_module(:Profile) true ", run `using Profile` instead" | ||
# @deprecate_binding Dates root_module(:Dates) true ", run `using Dates` instead" | ||
# @deprecate_binding Distributed root_module(:Distributed) true ", run `using Distributed` instead" | ||
# end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated?
src/julia-syntax.scm
Outdated
`(block | ||
;; *** either this or force all for loop vars local | ||
,.(map (lambda (r) `(local ,r)) | ||
(lhs-vars (cadr (car ranges)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
src/julia-syntax.scm
Outdated
@@ -2269,7 +2274,7 @@ | |||
(cdr (cadr e)) | |||
(list (cadr e)))) | |||
(first #t)) | |||
(expand-for (if first 'while 'inner-while) | |||
(expand-for first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop lowering code for inner-while
also (a few lines above here)
base/inference.jl
Outdated
strip_iter_union(X::Type{Union{Nothing, T}}) where {T} = T | ||
strip_iter_union(::Type{Nothing}) = Nothing | ||
strip_iter_union(::Type{Union{}}) = Union{} | ||
strip_iter_union(T) = T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this'll be very slow (because of compilation), and possibly unreliable (because T = Union{Nothing, T}
is also a valid solution), use typesubtract
instead
head[i] = y[1] | ||
while i < n | ||
y = iterate(c, y[2]) | ||
y == nothing && return (resize!(head, i), ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you making this intentionally type-unstable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly I just wanted to get the tests to pass so I could start looking at the performance, though I don't think it matters too much since it's only used in pmap, which is already fairly complex.
75eeebb
to
9f5c04f
Compare
@@ -1979,6 +1985,10 @@ function hash(a::AbstractArray{T}, h::UInt) where T | |||
if isa(a, AbstractVector) && applicable(-, x2, x1) | |||
n = 1 | |||
local step, laststep, laststate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
laststate
isn't used anymore.
After much triaging, it seems like the favored approach is to do this changed as originally proposed in this PR. As of yet unresolved is whether to add a canonical |
203402a
to
50a5c7d
Compare
Planning to merge once CI passes. @nanosoldier |
I think ideally, we should deal with bug-fixes from merging SSAIR found by PkgDev (https://pkg.julialang.org/pulse.html) before we add a breaking change on top |
You'll have to convince people to delay tagging the alpha for that reason. |
Test failure is LLVM failing to vectorize because it doesn't like our loop new structure (early exit at the top). Shouldn't be too hard to fix. Will take a look in the morning. Alternatively, we may want to change the lowering after all to make the loop structure more obvious to LLVM. |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Fixing LLVM was too hard, I updated this to use the loop inverted lowering, which should fix the test case. @nanosoldier |
This changes the iteration protocol from `start`/`next`/`done` to `iterate`. The new lowering of a for loop is as follows: ``` for x in itr ... end ``` becomes ``` next = iterate(itr) while next !== nothing x, state = next::Tuple{Any, Any} ... next = iterate(itr, state) end ``` The semantics are as apparent from the above lowering. `iterate` returns either `nothing` or a tuple of value and state. The state is passed to any subsequent operation. The first iteration is indicated, by not passing the second, state argument to the `iterate` method. Adaptors in both directions are provided to keep the legacy iteration protocol working for now. However, performance of the legacy iteration protocol will be severely pessimized. As an optional add-on for mutable iterators, a new `isdone` function is provided. This function is intended as an O(1) approximate query for iterator completion, where such a calculation is possible without mutation and/or is significantly faster than attempting to obtain the element itself. The function makes use of 3-value logic. `missing` is always an acceptable answer, in which case the caller should go ahead and attempt the iteration to obtain a definite result. If the result is not `missing`, it must be exact (i.e. if true, the next call to iterate must return `nothing`, if false it must not return nothing).
The primary idea of the new iteration protocol is that for a function like: ``` function iterate(itr) done(itr) ? nothing : next(itr) end ``` we can fuse the `done` comparison into the loop condition and recover the same loop structure we had before (while retaining the flexibility of not requiring the done function to be separate), i.e. for ``` y = iterate(itr) y === nothing && break ``` we want to have after inlining and early optimization: ``` done(itr) && break y = next(itr) ``` LLVM performs this optimization in jump threading. However, we run into a problem. At the top of the loop we have: ``` y = iterate top: %cond = y === nothing br i1 %cond, %exit, %loop .... ``` We'd want to thread over the `top` block (this makes sense, since by the discussion above, we need to merge our condition into the loop exit condition). However, LLVM (quite sensibly) refuses to thread over loop headers and since `top` is both a loop header and a loop exit, we fail to perform the appropriate transformation. However, there's a simple fix. Instead of emitting a foor loop as ``` y = iterate(itr) while y !== nothing x, state = y ... y = iterate(itr, state) end ``` we can emit it as ``` y = iterate(itr) if y !== nothing while true x, state = y ... y = iterate(itr, state) y === nothing && break end end ``` This transformation is known as `loop inversion` (or a special case of `loop rotation`. In our case the primary benefit is that we can fuse the condition contained in the initial `iterate` call into the bypass if, which then lets LLVM understand our loop structure. Co-authored-by: Jeff Bezanson <jeff@juliacomputing.com>
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Do we, uh, have a handle on those 600x regressions? |
Yes, they trigger the deprecation warning for the old iteration protocol. |
LegacyIterationCompat{T, typeof(val), typeof(state)}(val, state) | ||
end | ||
|
||
function next(itr::I, state::LegacyIterationCompat{I,T,S}) where {I,T,S} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and similar for done
) are real ambiguity magnets for all iterators out there that define next(it::MyIterator, state)
. Of course, next
should never be called with arguments that hit the ambiguity, but detect_ambiguities
is not amused. But I suppose there is no easy way to prevent that?
I imported @vtjnash's lowering change and then went ahead and rewrote the base iterator wrappers to use the new iteration protocol instead so people can see it in action. This passes the iterators test, but I haven't run other tests yet, so I suspect some fail.
The strategy here basically what is described in #18823, with the exception of using unions rather than nullables. Additionally, I kept
done
around as an optional addition (the semantic guaranteeis that if done is not an error, then it determines whether or not the next call to iterate will return true). This was necessary, because the
isempty
function was derived from the iteration protocol previously, which is no longer possible with the new code. However, after playing with this, it is not actually clear to me that any use ofdone
other than to implementisempty
is useful with the new iteration protocol, so maybe the correct thing here is just to have iterators optionally implementisempty
just as they wouldlength
now (for iterators where this is determinable).Couple of things on my todo list:
done
method to any iterator by prefetching an element and keeping it in the state (essentially the generic version of the trick people were using to implementdone
for such iterators before)There was also some performance concerns needed to be looked at (and @JeffBezanson and @vtjnash wanted to help out with), so hopefully this branch can help with that.
Comments and feedback greatly appreciated (again though, WIP, not supposed to pass tests, etc, so there's probably bugs, but hopefully this is useful to get a feel for what this change does).