-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: Go 2: function values as iterators #43557
Comments
Iiuc we can achieve something similar by returning a chan. |
@Deleplace Using a channel can work, but it has some difficulties.
|
One thing I'd consider to be missing would be a way to inform the iterator that the caller is "finished" with the iterator, i.e. if the range loop were exited early for some reason. I can see some iterators wanting some element of cleanup; for example, building up a type safe iterator for I'd also wonder if this would need to wait for generics, such that the definition of this pattern could be well-defined with a generic interface somewhere rather than being "well-known" (and in regards to the above cleanup, potentially have some extra method an iterator could provide to be called on loop exit). |
This code example is the same for #40605 and the current proposal, the only difference being matching on an interface type as opposed to a function type. You mention that matching on an interface's methods as opposed to a concrete type is not Go-like, but in the same vein, the error handling proposals thus far have treated |
I see many of the repositories provide iterator in other way, for example: func ForEachCell in "github.com/tealeg/xlsx" // ForEachCell will call the provided CellVisitorFunc for each
// currently defined cell in the Row. Optionally you may pass one or
// more CellVisitorOption to affect how ForEachCell operates. For
// example you may wish to pass SkipEmptyCells to only visit cells
// which are populated.
func (r *Row) ForEachCell(cvf CellVisitorFunc, option ...CellVisitorOption) error {
return r.cellStoreRow.ForEachCell(cvf, option...)
} and call it by like err = r.ForEachCell(func(c *Cell) error) {
// do something here ...
} and can also define a common error like ErrEarlyBreak to know what happen while iter through all data, just like ErrNil for telling people key not exist in redis err = r.ForEachCell(func(c *Cell) error) {
// do something here ...
if someCondition {
return ErrEarlyBreak
}
} |
@Deleplace What Ian said. I also don't foresee the compiler gaining the ability to optimize that down at any point. It gets quite hard, and the optimization becomes fairly fragile. @zikaeroh It may be true that some iterators want an element of cleanup, but I'd argue most don't. To me this is a signal that whatever the code is going iterate over is fairly complex, and maybe there's a better abstraction for that. For instance, the RE: not seeing a benefit over having an interface, I think I get that. You also make a good point about this pattern being surprising, especially if it's not backed by some formal definition. It's true that most other languages do this differently and that's perhaps what folks have come to expect. @smasher164 In my defense, none of those proposals actually made it yet. :) But, that's a good point. You certainly could go that route instead. FWIW, the "Go-like" in the proposal is absolutely subjective. Interfaces are "Go-like" too. To me, |
@ivila I mention that (or a simpler version of that) in the proposal. I think that pattern has its uses and isn't going to go away (we already have e.g. And maybe its limited use is a good reason not to support it. I do think there are still cases where the iterator I've proposed can be useful. It works nicely for iterating over common data structures (or some combination thereof), optimizes nicely in the common case, and it composes nicely with map and filter functions (though it gets harder for the compiler to optimize that, of course). That last bit about composition gets easier with generics, but you can still do it without. |
Could you say what is meant in the My guess is it’s “key, value” like when range-ing over a map. |
I don't have a strong feeling about this proposal, but I also have to say it's probably the nicest approach to native iterators I've seen.
As a counter-point, ranges can already block for a long time (e.g. channel receive) or consume a surprising amount of CPU (e.g. if copying huge arrays). This proposal could make these cases more common, but I don't think it would be something new. The syntax containing a function call also helps hint at running arbitrary code for each iteration.
Out of all the downsides you list, I think this is the only worrying one. At the same time, it's a problem with any proposal to add iterators to the language, so I don't think it's a problem with this approach in particular. |
Yeah, sorry about that, the |
As proposed, this seems to allow the use of method values, meaning that it is actually essentially compatible with the original iter := tree.Iter()
for i := range iter.Next {
// Do stuff with i.
} |
The word "closure" here is a red herring and an implementation detail of the example code. Closures (in Go terminology) are function literals, but there's no reason to enforce that, or to really even know. The word wanted here is "function value" (and method values are function values). I would suggest retitling to "proposal: Go 2: function values as iterators" |
"More specifically, a for-range statement may accept variables" should say "More specifically, a for-range statement may accept values" |
Thanks! 😀
I thought about writing down that counter-point, but what stopped me was the thought that it could do an arbitrary amount of CPU work. But you raise a good point that that's possible today, via copies. I may go back and add that counter-point, then.
Agreed. |
I totally agree here, though I will note that, as I understand it, the proposed syntax uses a function value as the argument to range, not a call (like the |
Right. I'm aware of the distinction, though I figured it was important to make it clear that you can't really do much with an iterator that is a function value but not a closure. You can, for instance, return some value infinitely, and I suppose you can also iterate over some value in global variable as well. Overall, what you suggest is indeed more precise. I'll update the proposal.
EDIT: Re-reading that passage, I think you're 100% right. I modified it to fit your phrasing. :) Thank you for your suggestions! |
You can replace any closure by defining a type that captures the free variables, returning a method value on that type. That often leads to more debuggable & readable code, too. |
A method value is, in a very real sense, a closure over the receiver. (function literal∶closure∷square∶rectangle) |
Sure, but in that case I posit the term "closure" is not defined in the spec, and shouldn't be used for that reason alone.
That reads as an example, not an exhaustive normative definition, but it is the only place the word appears. One of the strengths of Go is that the spec is very clean and easy to refer to. |
Just to clarify, it seems that with this proposal for i := range f { ... } is equivalent to for i, next := f(); next; i, next = f() { ... } |
And maybe: for := range f { ... } is equivalent to: for next := f(); next; next = f() { ... } And maybe for higher numbers of yielded values than 2? Except presumably the iteration vars have the same capturing semantics that we all love so well. Further into esoterica, range loops over channels zero out the iteration vars at the end of each iteration, to avoid pinning large objects in memory while waiting an arbitrary length of time for a channels receive. It might not be a bad idea to do that same here, were this accepted. |
What about: for i := range f { ... } when If allowed, I presume it is equivalent to: for i, _, next := f(); next; i, _, next = f() { ... } |
@ianlancetaylor Yup, that's pretty much all it is, though also extended to the 3-return-value form (i.e. @josharian I think clearing the variable at the end of the loop iteration is a good idea, if this goes anywhere. RE: generating a new variable on each iteration would be nice, but I suspect it's still probably better to keep things consistent. Maybe that one experiment to look at all of GitHub will pan out. :) |
for a1, a2, ... an, ok := range f { ... } is translated as explained above (for just two results). This could even make sense if there's only a single (boolean) result. I think @josharian alluded to this as well, above.
for {
a1, a2, ... an, ok := f()
if !ok { break }
...
} we would get new iteration variables for each iteration which would address the variable capture problem without the need for a new
|
While I'm not sure if In whichever case, though, I also worry a little about potential performance issues stemming from having to multi-thread this. I'm not entirely sure that just directly using a channel is the correct approach for a generic iterator system, either. I think that this should just be one possible way to create an iterator, be it
The go2go playground's been having that problem a lot recently. I had a ton of trouble with it the other day. Also, several of the examples above forgot the |
My bad - I had added the |
Correct me if I am wrong, but pretty much every Generator uses some form of coroutine in order to implement the pattern. I've checked Kotlin and Rust at the very least. However I also believe that they control the coroutines a bit better so that it still operates single-threaded, and the value-passing also works differently from using a channel. The goroutine/channel overhead is quite the problem at large scale, in this case I have a binary tree of 10,000 elements and show how slow iteration is on each one: https://play.golang.org/p/1asKVzsdLNn. You cannot get results in the go playground, as time is fixed. Feel free to try on your own machine. Results on my machine are ~1ms for purely-function-call based iteration, and ~480ms for the channel-based iteration. I expected the performance to be worse, but not that bad. Perhaps some optimizations could be done to EDIT - As a test, I implemented it in Javascript, and it runs for ~9ms. There definitely has to be a more optimal way to do this without goroutine overhead. https://gist.github.com/deanveloper/32d6c0c9b915f464cb917d5f76c34dd9 EDIT 2 - Did some more testing: https://play.golang.org/p/87HxaEmecno (again, execute locally for correct timings). Even with only a single goroutine created, the channel overhead is quite high. To iterate over a 10k element slice, it takes 20ms with chans, and 0.2ms with functions. That's 2 orders of magnitude... I'm not sure if we can get optimizations to help that much. |
I think without language-level support, there's going to be some pretty bad performance issues that aren't really easily resolvable. It might be possible to optimize the channel for this specific usage and cut out some of the overhead, as I'm guessing that quite a bit of that comes from locking that isn't useful in this particular situation. |
Setting GOMAXPROCS to 1 cuts the tree time by 20% and the slice time by 50%, but clearly there are still orders of magnitude of overhead to work out. I do think it is theoretically doable though, since many other languages have a coroutine mechanism. There "just" (ha!) needs to be some way to detect that the routines will always yield to each other and set them to a simpler scheduler once that's detected. |
I think the best way to handle functions that need to have some sort of cleanup or close is to return a second
No runtime support is necessary, and the pattern is familiar to Go programmers from |
Unfortunately that remove the ability to have "inline" generators, like my earlier examples (ie It may be quite bothersome to manually call an extra |
@deanveloper, I don't think those are big problems.
You normally wouldn't do that. Think about how you use
I'm sure there are situations where the close needs to happen somewhere else, but I'm guessing they're rare. As for this all being cumbersome, that's true, but I don't think the language or runtime features you're hoping for are likely to happen. You could also do something like
and then it would be more like a file, but it turns out that forgetting to call |
I've actually been working for a bit on speeding up this conversion from the "yield" style iterator to a one-at-time iterator. I think the first step is to remove the explicit channel:
You get the next element by calling the function; a second return value of Now that the channel is hidden, you can play tricks like using a
For enumerating the values of a simple binary tree, this code is about 40% slower than calling the yield iterator directly, for large enough trees (with a 16K buffer, around a million nodes). It's much worse for smaller trees, because of the initial overhead. For small trees it's probably better to just use the yield iterator to populate a slice, and iterate over the slice. |
Ahaha, and we're back at where we started. Earlier, I was thinking about if we could possibly get yield-style iterators to work in the more classical iterator fashion, which would be the most ideal way to do it. My idea was similar to yours, before realizing that we can simply use channels as iterators if we got the needed performance improvements. To organize some of my thoughts, here is what I think of the main proposed solutions, in descending order of preference:
|
@ianlancetaylor used a finalizer to clean up a deadlocked goroutine in the generics proposal. |
Relevant code: // Ranger provides a convenient way to exit a goroutine sending values
// when the receiver stops reading them.
//
// Ranger returns a Sender and a Receiver. The Receiver provides a
// Next method to retrieve values. The Sender provides a Send method
// to send values and a Close method to stop sending values. The Next
// method indicates when the Sender has been closed, and the Send
// method indicates when the Receiver has been freed.
func Ranger[T any]() (*Sender[T], *Receiver[T]) {
c := make(chan T)
d := make(chan bool)
s := &Sender[T]{values: c, done: d}
r := &Receiver[T]{values: c, done: d}
// The finalizer on the receiver will tell the sender
// if the receiver stops listening.
runtime.SetFinalizer(r, r.finalize)
return s, r
}
// A Sender is used to send values to a Receiver.
type Sender[T any] struct {
values chan<- T
done <-chan bool
}
// Send sends a value to the receiver. It reports whether any more
// values may be sent; if it returns false the value was not sent.
func (s *Sender[T]) Send(v T) bool {
select {
case s.values <- v:
return true
case <-s.done:
// The receiver has stopped listening.
return false
}
}
// Close tells the receiver that no more values will arrive.
// After Close is called, the Sender may no longer be used.
func (s *Sender[T]) Close() {
close(s.values)
}
// A Receiver receives values from a Sender.
type Receiver[T any] struct {
values <-chan T
done chan<- bool
}
// Next returns the next value from the channel. The bool result
// reports whether the value is valid. If the value is not valid, the
// Sender has been closed and no more values will be received.
func (r *Receiver[T]) Next() (T, bool) {
v, ok := <-r.values
return v, ok
}
// finalize is a finalizer for the receiver.
// It tells the sender that the receiver has stopped listening.
func (r *Receiver[T]) finalize() {
close(r.done)
} Something to note: This method of doing it makes it unsafe to give the raw channel to the user, which in turn means that the user can't directly use it with a Edit: Also, if the documentation for |
Something that's neat about that package iterator
func Generator[T any](generator func(yield func(T) bool)) func() (T, bool) {
send, recv := chans.Ranger()
go func() {
generator(send.Send)
send.Close()
}()
return recv.Next
} We could also add in the buffering that @jba mentioned to get a significant speedup, assuming that we don't get any goroutine/channel performance improvements that make it necessary. Even with the performance improvements, it will likely be beneficial to iterate over slices instead of using a buffered channel. |
For a proper performance improvement, it could also be reimplemented at some point later the way that a bunch of other languages do it, thus getting rid of the channel and background thread, and the API could be left as is. I'm still not thrilled about the need for |
Unfortunately I can't seem to get it to work. I'm not sure what's preventing (Or, the Go 1 version which I was testing on my local machine: https://play.golang.org/p/6SaBW8TmNsy) |
I fixed it. It seems like the problem was that the use of a method value for the finalizer function resulted in an extra reference being kept around, so it never got garbage collected. I changed it to a method expression and it worked. |
The proposal should not attempt to solve the variable capture problem (in which "for x := range seq" creates a single variable x for the whole loop, which is sometimes undesirable). It would be surprising and inconsistent for the compiler to create one variable per loop if seq is a slice, but one variable per iteration if seq is a function. Also, if the variable escapes, it would turn a single allocation into a linear number of allocations. I'm not a fan of this proposal. Although the range syntax is neat, it is confusing at first sight, unlike most Go constructs; and as @jba points out, it creates for the first time a function call where there is no call syntax, and doesn't address the great variety of iteration constructs that come up in practice. |
@tv42 wrote:
I tend to disagree with the second sentence there. In my view a closure is significantly less debuggable than an interface value because there's no way to tell what's going on inside it: we can't find out what type of iterator it is, and there's no way to define other methods on it that could be used to help debugging. I also think that having an explicit type is clearer. When I see a |
@rogreppe I think you misunderstood what I was trying to say. I also think closures are harder to debug than explicit types. |
In case it meaningfully develops the discussion, I'm actually a vote in favor of closures. My introduction to Go was in context of an API controller, backed by SQL. The standard database/sql Rows type had an iterator that carried a "Next()bool" method and "Scan(...interface{}) error" method. It was convenient, straight forward, and if you pre-allocated appropriate to the return size, was blisteringly fast. I adopted it as an iterator in a two-closure fashion such as below. It's entered generically, but typically I would have redefined it per declared type. I get that closures aren't considered ergonomic, but I dissent. Excepting generics, the following is a pattern that is in use, in production code. type IterRead[T any] func()*T
type IterWrite[T any] func(*T) bool
type IterNext[T any] func() bool
type Data[T any] []T
func iterSeek[T any](s *[]T, i int) (int, bool) {
switch {
case s == nil: return i, false
case len(*s) > i+1: return i+1, true
default: return i, false
}
}
func iterScan[T any](s *[]T, v *T, i int) (int, bool) {
i, ok := iterSeek(s, i)
if ok { *v = (*s)[i] } // could just pointer+SizeOf(T)...I guess I haven't played with unsafe+generics since the preview on 1.17
return i, ok
}
func Map(dP *Data[T]), f func(*T) T, z func() T) (IterRead[T], IterNext[T]) {
var v T
i, ok := -1, false
return (
func() *T { return &f(v) },
func() bool {
if ok { i, ok = iterScan(dP, &v, i) }
if !ok { v = z() }
return ok
}
)
} It might feel heavy, but it's fairly easy to bundle stuff up out of the way. When used with concrete types, performance is quite good, and a read/write version is just for instance, supposing the above, to transform a slice of any inferable type: f, n := Map(data, mapFunc, zeroFunc)
for n() {
fmt.Printf("%T: %v", f(), f())
} And considering that there's not actually a dependency on generics here, as I use the pattern with concretes, it's an example of something that absolutely works. It could as easily be used with generics as with interfaces, meaningful or empty interfaces, as everything is defined by the caller, and type-assertions can be performed as appropriate by the caller. I don't know what coroutines would look like as implemented in Go, but this pattern serves me as a poor-mans thread-local coroutine. To performance, this gets better if a pattern following this model could be guaranteed-inlined. In terms of ergonomics, I am stoked for #21498 to be finalized. Even in "today Go", I think this pattern has potential. If you then want to associate those independent closures with a struct, sure. Instantiate an interface-literal, and attach them? Sure. |
That could be solved with something like this: type RangeIterator[T any] func() (value T, ok bool)
type ResourceIterator[T any] interface {
Begin() RangeIterator[T]
Close()
}
iter := tree.Iter()
for i := range iter.Begin() {
// Do stuff with i
}
iter.Close() Note that the range aspect of the proposal would be unaltered. The cleanup is just built into the outer type according to its particular needs. |
I'm in favor of this proposal. Iterating over user collections or generators is something that I've always felt is a bit clunky in Go. I'm a huge fan, but nothing is perfect. Today I wrote yet another generator that did not feel right and was thinking about writing a proposal myself. Researching the open issues revealed that multiple proposals already exist. I'll leave my comments here instead in the hope that they may influencing the decision to do something about the situation. I landed in two solutions myself of which one is identical to the "range over closure" proposal presented here. The second solution that I've always had in mind is to just introduce "proper while loops" as they exist in C. This suggestion was shut down in #21855 but I think the risk of errors would be eliminated by using a "while" keyword instead of overloading another behavior with "for". Such a while would accept the same forms as "if". Example: Procedural approach iter := tree.Iter()
while i, ok := iter.Next(); ok {
// Do stuff with i
} Not to side track this proposal with another one. I just wanted to mention this alternative to get it off my chest. As already listed, I'm aware of thee common ways to iterate over things.
I don't think channel based approaches are reasonable for performance reasons. Iterating over collections is something that is done fairly often. Having support for expressing this naturally would make it easier to read code (and IMHO make it slightly more elegant). |
Folks following this proposal will likely be interested in discussion: standard iterator interface |
Please see the discussion at #56413, which extends this idea. |
Closing this proposal in favor of the discussion at #56413, which includes much of what is discussed here, and more. |
Proposal: Function values as iterators
Motivation
A number of proposals have come up over the years of supporting some first-class notion of iterator such that loops that use the
range
keyword may be used to iterate over some custom data structure. The Go community in general has also wondered about the "right" way to describe an iterator, evidenced by this blog post from 2013 that describes many of the ways Go programmers abstract away iteration, and much of this is true today as well.Overall, however, iteration over a non-builtin data structure may be somewhat error-prone and clunky. To refresh on the various ways that iteration may be abstracted away in Go, let's work through some examples.
Firstly, we may have some tree type with some dedicated iterator type:
One way such an "iterator" could be used is,
or even,
The former usage works fine, but suffers from readability issues. The meaning of the code is obscured somewhat as the reader first sees an apparently infinite for-loop and must look for the "break" statement to understand what's going. Luckily this condition is usually present at the top of the loop, but it requires a more careful look. Furthermore, the iteration condition needs to be written explicitly. Writing it once may not be a problem, but writing it 100 times might be.
The latter usage also works fine and the intent is more clear, but it has a similar problem with the iteration condition. There's also an element of repetition which on the surface is fine, but it does harm the readability of the loop. Especially with variable names like "i" it becomes easy to get lost in punctuation.
Another way to abstract iteration away is to pass a function value to a function that iterates on behalf of the caller. For example:
This method works well in many scenarios, but is decidedly less flexible as it separates the loop body from the surrounding code. Capturing local variables in that function value helps, but potentially at the cost of some efficiency, depending on the complexity of the iteration. One advantage of this method, though, is that
defer
may be used to perform clean up on each loop iteration, without allocating a defer (thanks to open-coded defers).Prior art
A previous proposal (#40605) suggested allowing types with a
Next
method to have that method repeatedly called when used in a range loop:This works fine, but from my perspective, doesn't feel very Go-like. Having a language construct be aware of a type's methods for the sake of syntactic sugar is not a pattern found anywhere else in the language (yet). In the parlance of the generic design, the existing
range
keyword matches on the underlying type used, not the type's methods.Furthermore, it usually requires defining a new type which is a bit more work for the writer of the code as well as the reader. Overall a bit clunky, but not bad. It lines up well with how other languages work. Rust, for instance, uses a trait (analogous to an interface) to determine whether a type is an iterator.
Another previous proposal (#21855) suggested supporting a two-clause for-loop to make iterating less error-prone, such as:
Unfortunately, a two-clause loop form is itself error-prone, as the placement of a single semicolon has a significant effect on semantics (specifically, the second semicolon which distinguishes between the 2-clause and 3-clause forms). This proposal was rejected because that semicolon was considered too dangerous.
Other proposals (#24282) have been made to fundamentally change how
for
loops work, indicating at least some degree of friction.Proposal
Rolling with the idea of closures, and with the observation that the range form matches on types largely structurally, I would like to propose that range loops repeatedly apply function values of a particular form.
More specifically, I propose allowing the for-range statement to accept values of type
func() (T, bool)
orfunc() (T, S, bool)
(whereT
andS
are placeholder types; they may be any substituted for any other type) and will repeatedly call these values until theirbool
result is false.Iterators may then be written in the following way:
More precisely, the last
for
statement "de-sugars" into the following code, wheretmp
is a temporary variable not visible to the program:The limitation to variables of type
func() (T, bool)
orfunc() (T, S, bool)
(instead of allowing an arbitrary number of return values) is to keep range loops looking familiar and to avoid a misuse of the syntax.Discussion and observations
Pros:
Iter
and perform a direct function call for each iteration (theoretically that could be inlined further, but it depends).Cons:
range
could mean, and each call could be arbitrarily expensive (e.g. a series of HTTP requests).range
keyword doesn't always make sense to the reader (it does for custom data structures and in some other cases, though).func() (T, bool)
may not always mean "iterator." ANext
method is more explicit.This idea was inspired by the way Python generators are used as iterators. I recently had the realization that Python generators can approximated by repeated application of closures. I thought that this would have been proposed already, but I couldn't find anything like it. Interestingly, this proposal also allows for iterating over infinite generators and such. It can also be used to write Python-like loops (for better or for worse):
It might also be worth allowing the iterator's final return value to be an
error
instead of abool
, though I'm not totally sure how to surface that to the user.I'm not even totally convinced myself that this is worth doing, since the benefit seems minor at best, but I figured I should at least put the idea out there.
The text was updated successfully, but these errors were encountered: