Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Demonstrate simplification of Iterator in js-sys #1043

Closed
wants to merge 2 commits into from

Conversation

richard-uk1
Copy link
Contributor

Simplify iterator implementation in js-sys.

This PR is currently for demonstration purposes, please do not merge.

Simplify iterator implementation in `js-sys`.
@alexcrichton
Copy link
Contributor

Sorry I haven't quite caught up on the surrounding discussion here, and sorry for the delay! Can you expand a bit on how this simplification is possible though and why it works the same as before?

@richard-uk1
Copy link
Contributor Author

richard-uk1 commented Nov 26, 2018

Essentially it is because JavaScript iterators have a next method that already matches the semantics of Iterator - it just has a different pattern (an object with done book rather than an option). This means we can just implement Iterator directly.

The current implementation keeps some state for when the iterator has ended, like the Fuse adapter does, so I mentioned that to recover current behaviour you can fuse the iterator.

@richard-uk1
Copy link
Contributor Author

I've also added a typed version based on the comments in the other thread.

It's not ready for merge because there are no tests - currently it's to demonstrate proof of concept.

@alexcrichton
Copy link
Contributor

There's some previous discussion about why the fusing is implementing, maybe @fitzgen you have thoughts on this?

@fitzgen
Copy link
Member

fitzgen commented Nov 28, 2018

Essentially it is because JavaScript iterators have a next method that already matches the semantics of Iterator - it just has a different pattern (an object with done book rather than an option). This means we can just implement Iterator directly.

This isn't quite true becuase JS can always throw on each next invocation, where as the equivalent in rust would be if Iterator::Item is a Result or not.

Basically we have to assume that Iterator::Item is always a Result for JS which then can cause issues for combinators like collect. 99% of the time, one shouldn't continue to call next after it returns an error, but most combinators don't handle this correctly (hence the fallible_iterator crate's existence). This isn't a theoretical bug, I've run into this multiple times before.

/// An `Iterator` wrapper for when we know the type that will be returned.
///
/// We also assume that errors from the use of this iterator are not possible.
pub struct TypedIterator<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the idea of building typed iterators on top, though.

Unsure if it belongs in js-sys or a third party crate.

Maybe want to make Item = Result<T> to handle potential exceptions and also use checked casts that return Err on cast failures?

Copy link
Contributor

@Pauan Pauan Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary purpose (but not the only purpose!) of the typed iterator is to provide for iteration on web-sys types.

For example, NodeList is an iterator of Node. And so we want to provide a TypedIterator<Node> for NodeList.

Since we know that the NodeList iterator doesn't error, we don't want it to return Result.

That's why it's called TypedIterator::from_iterator_unchecked, to emphasize the unchecked nature of it.

In practice, it's very rare for iterators to error (in general), and in the case of web-sys specifically I believe it's guaranteed that they don't error.

I think it should be in js-sys, because:

  1. web-sys needs it, and...

  2. ...a lot of JS types outside of web-sys also define iterators, so we should have a convenient zero-cost way for users to create iterators for those types.

  3. It fits in nicely with the existing iterator types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. web-sys needs it, and...

2. ...a lot of JS types outside of web-sys also define iterators, so we should have a convenient zero-cost way for users to create iterators for those types.

I think these two use cases call for different iterator types, though. For web-sys we can reasonably trust that the iterator will yield items that it says it will, and therefore it makes sense to use unchecked casts. For (2) I don't think that is the case at all, and we should be using checked casts by default.

Copy link
Contributor

@Pauan Pauan Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For (2) I don't think that is the case at all, and we should be using checked casts by default.

I disagree. When somebody creates bindings for an existing JS iterator, they know whether the JS iterator will throw or not.

Requiring all users of the iterator to unwrap the Result is really bad for ergonomics, and it doesn't give any extra safety either (since they know that the iterator cannot error).

It's not any different from how JsCast provides both unchecked_into and dyn_into. This is the unchecked_into version of iterators.

There is prior precedent for it as well: the wasm_bindgen macro by default does not catch errors, you have to specifically use catch to enable catching errors.

Having a checked iterator type sounds reasonable, but there should definitely be an unchecked iterator type as well (which is what this type is).

So are you proposing to have both checked and unchecked typed iterators? Because I would be completely fine with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are you proposing to have both checked and unchecked typed iterators? Because I would be completely fine with that.

Yes.

@richard-uk1
Copy link
Contributor Author

Thanks for the link to discussion - I hadn't seen that before.

I think this becomes a discussion about the exact semantics of javascript. Say I write my own iterator:

function repeat(value) {
  let iterator = {
    next: () => ({done: false, value})
  };
  let iterable = {};
  iterable[Symbol.iterator] = () => iterator;
  return iterable;
}

// Turn your computer into a room heater.
for(let val of repeat(5)) {
  console.log(val);
}

Then I can make it do any crazy thing I want. For example, I could make next return done: false the first time and then carry on working after, or it could return 42.

// iterator.next() returns 42
for(let val of repeat(1))
console.log(val)
// > TypeError: iterator.next() returned a non-object value

Isn't this the same situation as you would have in rust - an iterator only has to return None once, then it can return anything, so

struct DecRange(pos: i32);

impl Iterator for DecRange {
  type Item = i32;
  fn next(&mut self) -> Self::Item {
    let out = self.0;
    self.0 -= 1;
    if out == 0 {
      None
    } else {
      Some(out)
    }
  }
}

where you get garbage if you use it after it has returned None.

In rust, its up to the user to fuse the iterator if they need that behavior, why is it different here?

Is it because you're saying that if there is an exception (result is err), then you don't want to call the iterator again? I think if you want to use that logic you should return None on any exception, and then the Item can be the type (JsValue) rather than a Result. Or, alternatively, just return the error but allow next to be called again, so not fusing the iterator. I think the current solution is a bit 1/2 in 1/2 out.

@richard-uk1
Copy link
Contributor Author

richard-uk1 commented Nov 28, 2018

I think I would argue that in this crate you are correct to use Result<JsValue, JsValue>, but that you should not fuse the iterator, leaving that to a higher level library. There may conceivably be some terrible use of the JavaScript iterator protocol that requires recovering from an error.

This also has the advantage of both simplifying the implementation and the API (there's no need for 3 types you would need otherwise), and more closely matching the semantics of javascript (an iterator is an Iterator, not an IntoIterator, if you like). Any user can recover the current behaviour by just fusing the iterator, or if they want to expose that behavior, wrapping it in a type that fuses it.

EDIT I may well have missed something obvious or misunderstood something, I'm just bouncing some ideas around.

@Pauan
Copy link
Contributor

Pauan commented Dec 1, 2018

@derekdreery Or, alternatively, just return the error but allow next to be called again, so not fusing the iterator.

I think the goal should be to replicate the JS iteration protocol as closely as reasonable.

The JS iteration protocol is designed so that as soon as the iterator errors, it will end the iteration (it will never call next again after that).

So unfortunately allowing next to be called again isn't a good idea, especially because the iterator may be in an inconsistent state, so calling next again is especially bad.


Speaking of consistency with JS, currently js-sys isn't quite correct: the JS iteration protocol is designed to call the return method when the iteration is completed (steps 5j and 5m).

The spec is hard to read, but looking at the output of Babel is useful. This JS code...

for (let foo of bar()) {
    console.log(foo);
}

...gets translated into this:

var _iteratorNormalCompletion = true;
var _didIteratorError = false;
var _iteratorError = undefined;

try {
    for (var _iterator = bar()[Symbol.iterator](), _step; !(_iteratorNormalCompletion = (_step = _iterator.next()).done); _iteratorNormalCompletion = true) {
        var foo = _step.value;

        console.log(foo);
    }
} catch (err) {
    _didIteratorError = true;
    _iteratorError = err;
} finally {
    try {
        if (!_iteratorNormalCompletion && _iterator.return) {
            _iterator.return();
        }
    } finally {
        if (_didIteratorError) {
            throw _iteratorError;
        }
    }
}

Essentially, when the iteration is over (either because the iterator has finished, or because an error happened), it will call the return() method on the iterator (if it exists).

This is important for generators, which have a return method.

@richard-uk1
Copy link
Contributor Author

Continues in #1077

@Pauan Pauan mentioned this pull request Jan 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants