Re-evaluate IsRegExp call in MatchAllIterator? #34

ljharb · 2018-03-05T20:32:51Z

And I'm not sure about the first IsRegExp call in MatchAllIterator and I wonder if it makes more sense to always take the first branch in MatchAllIterator if called from RegExp.prototype[@@matchall] for consistency with RegExp.prototype[@@split]. But that should definitely be discussed in a different issue.

(Pending merging of #33)

anba · 2018-04-12T14:41:10Z

Let's talk first about the second IsRegExp call in MatchAllIterator (the one directly after calling RegExpCreate).

We currently have:

Else,
a. Let flags be "g".
b. Let matcher be ? RegExpCreate(R, flags).
c. If ? IsRegExp(matcher) is not true, throw a TypeError exception.
d. Let global be true.
e. Let fullUnicode be false.
f. If ? Get(matcher, "lastIndex") is not 0, throw a TypeError exception.

ECMA-262 never calls IsRegExp after RegExpCreate, so adding it here seems like a bad precedence. And if the IsRegExp call is removed, step 3.f can also be changed back to Assert: Get(matcher, "lastIndex") is 0.. (In general I think we should avoid adding arbitrary checks when there's no apparent reason to do so. For example MatchAllIterator could also throw an exception if %RegExpStringIteratorPrototype%.next is no longer a function object, because that indicates it's not possible to iterate over RegExp String Iterator objects. But adding such a check would be really weird! 😄 )

Now the other IsRegExp call in MatchAllIterator. I have two separate concerns here:

When MatchAllIterator is called from RegExp.prototype [ @@matchAll ], we should/have to assume the user explicitly decided to treat R as a RegExp object, so having an additional IsRegExp call to change this decision seems questionable. It's also not consistent with how the other RegExp.prototype methods work.
When MatchAllIterator is called from String.prototype.matchAll, I'd prefer to handle it more like the other String.prototype methods which create RegExp objects (that means String.prototype.match and String.prototype.search), because I want to avoid adding yet another way to handle RegExp sub-classes: There are already two different RegExp sub-classing/extension interfaces: The RegExp.prototype methods all call RegExpExec, which means sub-classes, or any other classes, only need to provide their own "exec" methods when they want to reuse the other RegExp.prototype methods. And in addition to that, the @@match/replace/search/split interfaces allow sub-classes to provide their own implementations for just these methods. The match-all proposal in its current form adds another dimension to this by providing different code paths depending on whether or not an object is RegExp-like (as per the IsRegExp abstract operation).
In my opinion we should only support RegExp sub-classing in two ways:
1. Either the RegExp sub-class has %RegExpPrototype% on its prototype chain,
2. Or the RegExp sub-class copies the relevant methods from %RegExpPrototype% into its prototype object.

Which means String.prototype.matchAll and RegExp.prototype[@@matchAll] should be changed to:

String.prototype.matchAll(_regexp_)
  1. Let _O_ be ? RequireObjectCoercible(*this* value).
  1. If _regexp_ is neither *undefined* nor *null*, then
    1. Let _matcher_ be ? GetMethod(_regexp_, @@matchAll).
    1. If _matcher_ is not *undefined*, then
      1. Return ? Call(_matcher_, _regexp_, &laquo; _O_ &raquo;).
  1. Let _string_ be ? ToString(_O_).
  1. Let _rx_ be ? RegExpCreate(_regexp_, `"g"`).
  1. Return ? Invoke(_rx_, @@matchAll, &laquo; _string_ &raquo;).

RegExp.prototype[@@matchAll](_string_)
  1. Let _R_ be the *this* value.
  1. If Type(_R_) is not Object, throw a *TypeError* exception.
  1. Let _S_ be ? ToString(_string_).
  1. Let _C_ be ? SpeciesConstructor(_R_, %RegExp%).
  1. Let _flags_ be ? ToString(? Get(_R_, `"flags"`)).
  1. Let _matcher_ be ? Construct(_C_, &laquo; _R_, _flags_ &raquo;).
  1. Let _global_ be ? ToBoolean(? Get(_matcher_, `"global"`)).
  1. Let _fullUnicode_ be ? ToBoolean(? Get(_matcher_, `"unicode"`).
  1. Let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
  1. Perform ? Set(_matcher_, `"lastIndex"`, _lastIndex_, *true*).
  1. Return ! CreateRegExpStringIterator(_matcher_, _S_, _global_, _fullUnicode_).

While this does create two RegExp objects for String.prototype.matchAll when it is naively implemented, it shouldn't be too hard to optimise in actual implementations to avoid the extra RegExp allocations. Or at least it isn't harder to optimise when compared to the other String.prototype and RegExp.prototype methods.

littledan · 2018-04-13T10:19:07Z

Given how there are test262 tests and a V8 implementation, these questions need to be answered quickly. I share @anba's intuitions here.

ljharb · 2018-04-13T17:37:12Z

Regarding the first issue, I think that reasoning makes sense - I'll prepare that change.

Regarding the second one: I'll think about your comment and provide a thorough response later today.

ljharb · 2018-04-13T21:01:59Z

@anba
In general, I've tried to make matchAll consistent with match above all else - however, "match" is special, because it's the marker for IsRegExp, so there are a few places that I think it makes sense to deviate slightly.

Thus, RegExp.prototype[Symbol.match] doesn't check IsRegExp because it doesn't have to - its presence makes it a regexp. You are correct that no other RegExp.prototype method - including any of the well-known Symbol methods - check IsRegExp. However, I still think there's value in including the check, because then RegExp.prototype[Symbol.match].call(x) can actually ensure you passed a valid x. I'm not going to die on this hill tho; if you think that the implementation cost of having this extra check outweighs the "earlier error" value, then I can remove this one.

Regarding the other part: there's already, sadly, multiple ways to subclass regexes: extends RegExp, to set up the slots, and IsRegExp, which a number of methods use. As much as I'd like to remove these, they're how regexp subclassing is done in the ES6 design. I don't think we can - or should - deviate from "relying on IsRegExp" unless we can do so throughout the spec, and I think @allenwb has a number of reasons why he doesn't want that to happen.

Your suggested change makes .matchAll not be robust against delete RegExp.prototype[Symbol.matchAll], and I'm not sure why that's a good idea, nor why keeping that robustness makes things harder to optimize than with your suggestion.

…ator Per #34 (comment)

ljharb · 2018-04-13T21:03:15Z

Filed #35 for the first change (of the three discussed).

anba · 2018-04-13T21:12:12Z

Your suggested change makes .matchAll not be robust against delete RegExp.prototype[Symbol.matchAll], and I'm not sure why that's a good idea, nor why keeping that robustness makes things harder to optimize than with your suggestion.

Why does it need to be robust against deleting RegExp.prototype[@@matchAll]? We have the same situation with String.prototype.match and String.prototype.search, where we don't care if the corresponding RegExp.prototype method was deleted:

delete RegExp.prototype[Symbol.match];
"".match(/a/); // <- Throws a TypeError

delete RegExp.prototype[Symbol.search];
"".search(/a/); // <- Throws a TypeError

ljharb · 2018-04-13T21:12:58Z

Indeed; that is the precedent - but I think it's a terrible one, and I wanted to do something better with this proposal than the status quo.

anba · 2018-04-13T21:24:55Z

If someone forcibly disrupts standard built-in methods or properties, it's their problem. For example we also thrown an error from String.prototype.match if RegExp.prototype[Symbol.match] was changed to a non-callable value (7.3.9 GetMethod throws an error for non-callable values). GetMethod could have been spec'ed to simply ignore non-callables, but we explicitly decided against that.

ljharb · 2018-04-13T21:50:50Z

So why is it OK to throw on non-callable values (which I agree with), but not OK here to throw on a non-regex being passed into a method that requires a regex?

littledan · 2018-04-14T00:13:23Z

I agree with @anba here. I think it's weird to guard against these things.

We've been discussing various guards for some months now on GitHub. If we're not able to come to agreement here, maybe it should be brought to the committee.

ljharb · 2018-04-14T01:20:03Z

I think the real question is, is this an implementation concern? This wasn’t called out during all of the spec reviews prior to stage 3 - I’m not sure how the suggested change eases implementability; either case seems equally optimizeable in the scenario of an unmodified matchAll function; but as-is, it’d perform the same in the case of an absent one.

littledan · 2018-04-14T07:41:12Z

As proposals move along between Stage 3 and Stage 4, deeper review ends up happening, and tweaks are made. This has happened with basically all large Stage 3 proposals, even for changes that are not driven my implementation concerns themselves. I am sorry I didn't call this our earlier.

ljharb · 2018-04-14T16:20:47Z

Sounds like I’ll have to add this to the May agenda then (altho #35 can be merged in the meantime)

tschneidereit · 2018-04-16T14:10:34Z

FWIW, I agree with @anba and @littledan: even agreed that the currently proposed behavior is slightly nicer, consistency beats that in importance.

Regarding the concern about this being raised late in the process: I think it would've helped to actively point this out as deviating from precedent. (AFAIK that hasn't happened - my apologies if that is incorrect!) It might make sense to make this an explicit recommendation in our various docs on championing.

ljharb · 2018-04-16T15:59:36Z

I thought this came up in committee around #28, but it's possible that it was overlooked.

…chAll tc39/proposal-string-matchall#34

mathiasbynens · 2018-04-28T21:39:44Z

+@hashseed @schuay

ljharb · 2018-05-22T17:54:28Z

@littledan @anba so after rereading this issue: it seems like the primary remaining objection is the "IsRegExp" check after the RegExpCreate check.

My intention is to get as early an error as possible - specifically, before the iterator is created, instead of during iteration. However, the only error this might guard against is in https://tc39.github.io/ecma262/#sec-regexpexec, whose code path doesn't actually care about Symbol.match anyways.

Would you be content with replacing the IsRegExp check with https://tc39.github.io/ecma262/#sec-regexpexec step 5 - specifically, "If matcher does not have a [[RegExpMatcher]] internal slot, throw a TypeError exception."? This shouldn't be observable in the happy path, and shouldn't impact performance.

littledan · 2018-05-22T18:02:08Z

@ljharb I don't think such a check will ever result in a TypeError thrown; that internal slot will always exist on a return value of RegExpCreate AFAICT (though it might have a different prototype or the prototype may have been mutated).

ljharb · 2018-05-22T18:04:21Z

ah, good point - you're right.

It seems that the appropriate resolution here, then, is to just remove the check - which closes the issue and obviates the need to come back to the committee.

littledan · 2018-05-22T18:49:37Z

Maybe it'd be worth a 30-second status update to say that you're making this change and that this resolves the issue. I like to do a quick committee presentation for changes to Stage 3 proposals, personally.

ljharb · 2018-05-22T18:55:59Z

Sounds great, I'll do that later today.

ljharb · 2018-05-22T18:58:32Z

Closed with #35.

anba · 2018-06-13T17:19:15Z

#35 only removed one IsRegExp call, but left the other one. I still think the reasoning outlined in #34 (comment) applies, for example when calling RegExp.prototype[@@matchAll] with an object X should always take the RegExp path in MatchAllIterator, independent of whether or not IsRegExp(X) returns true or false.

ljharb · 2018-06-13T17:20:56Z

@anba thanks, can you file a new issue for that?

ljharb · 2018-08-08T04:36:23Z

So, I've taken another look at this, and precisely because of the different behavior in the regexp vs string paths, the remaining IsRegExp call is absolutely necessary, and it's not reasonably to assume that R is only a regex.

I'm going to keep this as-is, and implementations should continue to ship. If there's further arguments, I'd be happy to have them on a new issue.

mathiasbynens · 2018-08-08T08:09:04Z

So, I've taken another look at this, and precisely because of the different behavior in the regexp vs string paths, the remaining IsRegExp call is absolutely necessary, and it's not reasonably to assume that R is only a regex.

Can you elaborate on why it's "absolutely necessary"? @anba had a lengthy explanation of his reasoning, including a suggested fix, in #34 (comment). Why is the suggested fix being rejected?

mathiasbynens · 2018-08-08T08:39:25Z

Ah, the bit of info I was missing was this comment in another thread:

(the committee has consensus for the dual behavior, and the call is required to maintain that)

IIRC, the last time matchAll was brought to committee was in January. The discussion in #34 (comment) happened later, in March. Given that several folks expressed a preference for @anba's proposed semantics, it would be valuable to get explicit consensus by presenting both options to the committee. What do you think?

ljharb · 2018-08-08T14:51:46Z

In January, we talked about what it should do with a string, and the committee consensus at that time (which included both options) was what we ended up with - not creating a regex with the string path, which resulted in the branching for string vs regex.

ljharb · 2018-08-08T14:53:05Z

If you think it needs to come back to committee, please file a separate issue - @anba’s comment is exceeding difficult for me to unpack, and to be able to confidently discuss anything about it in committee, id need to see arguments stated separately.

@anba

@anba outlined two separate concerns about the remaining `IsRegExp` call in `MatchAllIterator`. Quoting from tc39#34 (comment): 1. When `MatchAllIterator` is called from `RegExp.prototype [ @@matchall ]`, we should/have to assume the user explicitly decided to treat `R` as a RegExp object, so having an additional `IsRegExp` call to change this decision seems questionable. It’s also not consistent with how the other `RegExp.prototype` methods work. 2. When `MatchAllIterator` is called from `String.prototype.matchAll`, I’d prefer to handle it more like the other `String.prototype` methods which create RegExp objects (that means `String.prototype.match` and `String.prototype.search`), because I want to avoid adding yet another way to handle RegExp sub-classes. There are already two different RegExp sub-classing/extension interfaces: the `RegExp.prototype` methods all call `RegExpExec`, which means sub-classes, or any other classes, only need to provide their own `exec` methods when they want to reuse the other `RegExp.prototype` methods. And in addition to that, the `@@match/replace/search/split` interfaces allow sub-classes to provide their own implementations for just these methods. The `matchAll` proposal in its current form adds another dimension to this by providing different code paths depending on whether or not an object is RegExp-like (as per the `IsRegExp` abstract operation). In my opinion we should only support RegExp sub-classing in two ways: 1) Either the RegExp sub-class has `%RegExpPrototype%` on its prototype chain, or 2) the RegExp sub-class copies the relevant methods from `%RegExpPrototype%` into its prototype object. Ref. tc39#21, tc39#34.

mathiasbynens · 2018-08-08T15:57:13Z

@ljharb I agree @anba’s comment packs a lot of information! I’ve opened #37 based on @anba’s proposed change (as a pull request instead of an issue, in the hopes it helps to make the difference clear to others). Let’s move the discussion there.

- `String.prototype.matchAll`: - use `RegExpCreate` when `Symbol.prototype.matchAll` is not found - fall back to regex coercion otherwise - `RegExp.prototype[Symbol.matchAll]`: - receiver is assumed to be a regex implicitly - remove `MatchAllIterator` abstract operation Thus, `IsRegExp` call no longer exists. Addresses #21. Addresses #34. Closes #37.

ljharb mentioned this issue Mar 5, 2018

Normative: when regexp arg is a string, minimize observable operations #33

Merged

ljharb mentioned this issue Mar 20, 2018

Reduce indirections and calls with side-effects to make it easier to achieve a reasonable baseline performance? #32

Closed

ljharb mentioned this issue Apr 4, 2018

Tests for String.prototype.matchAll tc39/test262#1500

Merged

ljharb added a commit that referenced this issue Apr 13, 2018

Remove unnecessary IsRegExp call from the string path in MatchAllIter…

cfc0597

…ator Per #34 (comment)

ljharb mentioned this issue Apr 13, 2018

Remove unnecessary IsRegExp call from the string path in MatchAllIterator #35

Merged

peterwmwong added a commit to peterwmwong/v8 that referenced this issue Apr 17, 2018

[builtins] Proposed changes to reduce IsRegExp checks in String.p.mat…

8430e46

…chAll tc39/proposal-string-matchall#34

michaelficarra mentioned this issue Apr 28, 2018

sorting proposal-based agenda items primarily by current stage tc39/agendas#360

Closed

ljharb closed this as completed May 22, 2018

mathiasbynens mentioned this issue Jun 20, 2018

Path to Stage 4! #21

Closed

28 tasks

mathiasbynens mentioned this issue Aug 8, 2018

Improve consistency with existing RegExp methods and subclass handling #37

Closed

ljharb mentioned this issue Aug 8, 2018

[spec] Adjust to committee feedback #38

Merged

mathiasbynens mentioned this issue Sep 17, 2018

Committee feedback: fallback behavior #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-evaluate IsRegExp call in MatchAllIterator? #34

Re-evaluate IsRegExp call in MatchAllIterator? #34

ljharb commented Mar 5, 2018 •

edited

Loading

anba commented Apr 12, 2018

littledan commented Apr 13, 2018

ljharb commented Apr 13, 2018

ljharb commented Apr 13, 2018 •

edited

Loading

ljharb commented Apr 13, 2018

anba commented Apr 13, 2018

ljharb commented Apr 13, 2018

anba commented Apr 13, 2018

ljharb commented Apr 13, 2018

littledan commented Apr 14, 2018

ljharb commented Apr 14, 2018

littledan commented Apr 14, 2018

ljharb commented Apr 14, 2018

tschneidereit commented Apr 16, 2018

ljharb commented Apr 16, 2018

mathiasbynens commented Apr 28, 2018

ljharb commented May 22, 2018

littledan commented May 22, 2018

ljharb commented May 22, 2018

littledan commented May 22, 2018

ljharb commented May 22, 2018

ljharb commented May 22, 2018

anba commented Jun 13, 2018

ljharb commented Jun 13, 2018

ljharb commented Aug 8, 2018

mathiasbynens commented Aug 8, 2018

mathiasbynens commented Aug 8, 2018

ljharb commented Aug 8, 2018

ljharb commented Aug 8, 2018

mathiasbynens commented Aug 8, 2018

Re-evaluate IsRegExp call in MatchAllIterator? #34

Re-evaluate IsRegExp call in MatchAllIterator? #34

Comments

ljharb commented Mar 5, 2018 • edited Loading

anba commented Apr 12, 2018

littledan commented Apr 13, 2018

ljharb commented Apr 13, 2018

ljharb commented Apr 13, 2018 • edited Loading

ljharb commented Apr 13, 2018

anba commented Apr 13, 2018

ljharb commented Apr 13, 2018

anba commented Apr 13, 2018

ljharb commented Apr 13, 2018

littledan commented Apr 14, 2018

ljharb commented Apr 14, 2018

littledan commented Apr 14, 2018

ljharb commented Apr 14, 2018

tschneidereit commented Apr 16, 2018

ljharb commented Apr 16, 2018

mathiasbynens commented Apr 28, 2018

ljharb commented May 22, 2018

littledan commented May 22, 2018

ljharb commented May 22, 2018

littledan commented May 22, 2018

ljharb commented May 22, 2018

ljharb commented May 22, 2018

anba commented Jun 13, 2018

ljharb commented Jun 13, 2018

ljharb commented Aug 8, 2018

mathiasbynens commented Aug 8, 2018

mathiasbynens commented Aug 8, 2018

ljharb commented Aug 8, 2018

ljharb commented Aug 8, 2018

mathiasbynens commented Aug 8, 2018

ljharb commented Mar 5, 2018 •

edited

Loading

ljharb commented Apr 13, 2018 •

edited

Loading