Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HasLength() is the default iteratorsize for all custom iterators? Even if they don't define length? (Breaks Collect) #15977

Closed
oxinabox opened this issue Apr 21, 2016 · 16 comments
Assignees
Labels
docs This change adds or pertains to documentation

Comments

@oxinabox
Copy link
Contributor

oxinabox commented Apr 21, 2016

I was confused as to why my custom iterator, was failing to collect

Consider the MWE:

type Foo
    values::Vector{}
end

function Base.start(f::Foo)
    return 1
end
function Base.next(f::Foo, state)
    f.values[state], state+1
end
function Base.done(f::Foo, state)
    state>length(f.values)
end

Running with a for loop works fine:

for f in Foo([1,2,3])
    print(f)
end
> 123

Collecting Breaks:

collect(Foo([1,2,3]))
>LoadError: MethodError: no method matching length(::Foo)
while loading In[26], in expression starting on line 1

This is true -- there is not method to find the length of a ::Foo
So why is it being ask?

Because

Base.iteratorsize(Foo([1,2,3]))
>Base.HasLength()

It thinks it has a length.

I think it is coming from:

iteratorsize(::Type) = HasLength() # HasLength is the default

The workaround is to add to my definition

Base.iteratorsize(::Foo) = Base.SizeUnknown()

But I feel like this should be being inferred, by the face that neither size(::Foo) or Length(::Foo) are defined?

@oxinabox oxinabox changed the title HasLength() is the default iteratorsize for all custom iterators? Even if they don't define length? HasLength() is the default iteratorsize for all custom iterators? Even if they don't define length? (Breaks Collect) Apr 23, 2016
@mschauer
Copy link
Contributor

Is your Foo actually a custom iterator for which length cannot be computed ahead of iteration or cannot be efficiently computed?

@oxinabox
Copy link
Contributor Author

@mschauer In this example, well no.
But that is because this is a toy MWE.

My real code is doing tokenisation on long (multigigabyte) files.
Which can not be calculated efficiently ahead of iteration.
(it actually a bit worse than that as has a random chance to just discard some tokens)

@mschauer
Copy link
Contributor

Ok, you are correct, Base.iteratorsize(::Foo) = Base.SizeUnknown() should be added to your code. Defining Base.iteratorsize(::Foo) = Base.SizeUnknown() (so called traits) rather then using a test like
haslength(x) = applicable(length,x) in the collect function has technical and performance reasons . These design choices where discussed in #15146.

@oxinabox
Copy link
Contributor Author

Well, good to know it was on purpose. 👍

I personally prefer when breaking changes are made to core language parts
that they are documented and set up to trigger deprecation warnings.
Before being merged.
😛

But I do actually love that Julia is not at all afraid to break things to allow for the language to become better. I'ld rather undocumented breaking changes, than stasis.
😀

@tkelman tkelman added the needs docs Documentation for this change is required label Apr 27, 2016
@tkelman
Copy link
Contributor

tkelman commented Apr 27, 2016

The new iterator traits definitely need mentioning in the interfaces section of the documentation. We should be more careful that new features need documentation, not just implementation.

@mschauer
Copy link
Contributor

I wrote a short recap of the changes from what I understand: https://gist.github.com/mschauer/c8d8b7eb5b455cb12ddc9bda15695203

@mschauer
Copy link
Contributor

@JeffBezanson, do you agree with the recommendations in the last paragraph in the gist?

@tkelman
Copy link
Contributor

tkelman commented Jun 19, 2016

ping @JeffBezanson iterator traits are still very poorly documented, can you please respond with some feedback on @mschauer's proposal to fix that?

@mschauer probably easier to review if you open that as a PR

@timholy
Copy link
Member

timholy commented Jun 19, 2016

I noticed this as a potential concern while implementing #17006 and found this. @mschauer, your writeup looks correct (though there are a couple of layout and wording issues, folks here can help smooth those).

It's also possible that HasLength() might not be the correct default. The counter argument is that a MethodError is pretty easily resolvable, whereas someone might not notice that they're getting 3x worse performance than they could be. The argument is even stronger for the eltype, that the "dangerous" (might fail) default is, somewhat paradoxically, perhaps the better choice.

@mschauer
Copy link
Contributor

@timholy Wait a second,

The counter argument is that a MethodError is pretty easily resolvable, whereas someone might not notice that they're getting 3x worse performance than they could be.

reads like an argument for the default HasLength (which will cause a MethodError in case of a collection attempt without defined length). But I too thought that HasLength might not the right default in the long run - I was thinking of this when writing "in case changes to the fallback definitions are made".

@timholy
Copy link
Member

timholy commented Jun 20, 2016

Yes, I meant it as an argument in favor of HasLength.

It's even more important for eltype: you might get Any containers back otherwise. Compared to debugging that problem, debugging a missing eltype method is a walk in the park.

@tkelman
Copy link
Contributor

tkelman commented Jul 1, 2016

Ping again, @JeffBezanson are iterator traits really still undocumented?

@mschauer
Copy link
Contributor

mschauer commented Jul 1, 2016

Btw., I am just waiting for the outcome of #17053 before making changes

@tkelman
Copy link
Contributor

tkelman commented Jul 16, 2016

I think the outcome is that we won't be making any further API changes or trait additions before 0.5 if we can absolutely avoid it. We do however desperately need docs for iterator traits as they currently are, and I consider that RC-blocking. Jeff is evidently on vacation, but @mschauer would you be willing to submit a PR?

@JeffBezanson JeffBezanson self-assigned this Jul 20, 2016
@JeffBezanson JeffBezanson added the docs This change adds or pertains to documentation label Jul 20, 2016
@tkelman tkelman removed the needs docs Documentation for this change is required label Jul 20, 2016
mfasi pushed a commit to mfasi/julia that referenced this issue Sep 5, 2016
@ssfrr
Copy link
Contributor

ssfrr commented Oct 4, 2018

Given that collect only works for iterators that define length, should length be added to the "required" list for the iterator? Maybe with a note that says that it's not required if you define Base.IteratorSize(::Type{MyIterator}) = Base.SizeUnknown()?

Another way to phrase it might be to say that you should define at least one of length or IteratorSize.

@JeffBezanson
Copy link
Member

Another way to phrase it might be to say that you should define at least one of length or IteratorSize.

Yes, that's a good description; saying length is required is not as accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation
Projects
None yet
Development

No branches or pull requests

6 participants