-
Notifications
You must be signed in to change notification settings - Fork 12
Improve consistency with existing RegExp methods and subclass handling #37
Conversation
@anba outlined two separate concerns about the remaining `IsRegExp` call in `MatchAllIterator`. Quoting from tc39#34 (comment): 1. When `MatchAllIterator` is called from `RegExp.prototype [ @@matchall ]`, we should/have to assume the user explicitly decided to treat `R` as a RegExp object, so having an additional `IsRegExp` call to change this decision seems questionable. It’s also not consistent with how the other `RegExp.prototype` methods work. 2. When `MatchAllIterator` is called from `String.prototype.matchAll`, I’d prefer to handle it more like the other `String.prototype` methods which create RegExp objects (that means `String.prototype.match` and `String.prototype.search`), because I want to avoid adding yet another way to handle RegExp sub-classes. There are already two different RegExp sub-classing/extension interfaces: the `RegExp.prototype` methods all call `RegExpExec`, which means sub-classes, or any other classes, only need to provide their own `exec` methods when they want to reuse the other `RegExp.prototype` methods. And in addition to that, the `@@match/replace/search/split` interfaces allow sub-classes to provide their own implementations for just these methods. The `matchAll` proposal in its current form adds another dimension to this by providing different code paths depending on whether or not an object is RegExp-like (as per the `IsRegExp` abstract operation). In my opinion we should only support RegExp sub-classing in two ways: 1) Either the RegExp sub-class has `%RegExpPrototype%` on its prototype chain, or 2) the RegExp sub-class copies the relevant methods from `%RegExpPrototype%` into its prototype object. Ref. tc39#21, tc39#34.
@@ -20,7 +20,9 @@ contributors: Jordan Harband | |||
1. Let _matcher_ be ? GetMethod(_regexp_, @@matchAll). | |||
1. If _matcher_ is not *undefined*, then | |||
1. Return ? Call(_matcher_, _regexp_, « _O_ »). | |||
1. Return ? MatchAllIterator(_regexp_, _O_). | |||
1. Let _string_ be ? ToString(_O_). | |||
1. Let _rx_ be ? RegExpCreate(_regexp_, `"g"`). |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
@@ -20,7 +20,9 @@ contributors: Jordan Harband | |||
1. Let _matcher_ be ? GetMethod(_regexp_, @@matchAll). | |||
1. If _matcher_ is not *undefined*, then | |||
1. Return ? Call(_matcher_, _regexp_, « _O_ »). | |||
1. Return ? MatchAllIterator(_regexp_, _O_). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, I very intentionally made this function continue to work after delete RegExp.prototype[Symbol.matchAll]
, which this change breaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more consistent with the existing methods though, as discussed here: #34 (comment)
@anba, @littledan, @tschneidereit all spoke up in favor of consistency over deviating from the status quo in this edge case. I'm inclined to agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The majority of the existing methods fall back when the Symbol method is deleted; those two examples are exceptions, and I think matchAll should follow the more common approach of falling back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I agree that String.p.matchAll
should be specced consistently with String.p.match
and String.p.search
. These are the only other 'regexp' methods defined on String
. Is there a reason to diverge from their behavior?
Likewise in RegExp.p[@@matchAll]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also split and replace - see #38 (comment).
The consistency argument is exactly why the proposal has its current fallback behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the fallback in split and replace is to perform a non-RegExp string matching (i.e. "."
only matches the code point U+002E (FULL STOP)), which is different from what matchAll does.
@@ -20,7 +20,9 @@ contributors: Jordan Harband | |||
1. Let _matcher_ be ? GetMethod(_regexp_, @@matchAll). | |||
1. If _matcher_ is not *undefined*, then | |||
1. Return ? Call(_matcher_, _regexp_, « _O_ »). | |||
1. Return ? MatchAllIterator(_regexp_, _O_). | |||
1. Let _string_ be ? ToString(_O_). | |||
1. Let _rx_ be ? RegExpCreate(_regexp_, `"g"`). |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
1. Let _fullUnicode_ be *false*. | ||
1. Assert: ! Get(_matcher_, `"lastIndex"`) is *0*. | ||
1. Let _S_ be ? ToString(_string_). | ||
1. Let _C_ be ? SpeciesConstructor(_R_, %RegExp%). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why it's necessary to reconstruct a RegExp
here? @@match
doesn't behave like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I don't want mutations to .lastIndex
to be observable, so matchAll intentionally clones the regex to prevent that.
matchAll provides an iterator, so it's much more important to avoid observable mutations over time, and also to ensure that mutating the lastIndex
of the regex mid-iteration doesn't mess with the algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue explaining this behavior? It seems we could use a internal object to store this state, instead of calling the @@species
hook a second time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A second time? it's only called once in this spec - RegExpCreate doesn't call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, I got confused with the changes. Never mind.
- `String.prototype.matchAll`: - use `RegExpCreate` when `Symbol.prototype.matchAll` is not found - fall back to regex coercion otherwise - `RegExp.prototype[Symbol.matchAll]`: - receiver is assumed to be a regex implicitly - remove `MatchAllIterator` abstract operation Thus, `IsRegExp` call no longer exists. Addresses #21. Addresses #34. Closes #37.
Now that I have more clarity on the committee consensus, on the objections to I've put up #38; @mathiasbynens, does that alternate approach address your concerns? |
I don't understand why we need the alternate approach. It looks like you're still trying to support |
Indeed I am; commented here: #38 (comment) |
@@ -20,7 +20,9 @@ contributors: Jordan Harband | |||
1. Let _matcher_ be ? GetMethod(_regexp_, @@matchAll). | |||
1. If _matcher_ is not *undefined*, then | |||
1. Return ? Call(_matcher_, _regexp_, « _O_ »). | |||
1. Return ? MatchAllIterator(_regexp_, _O_). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I agree that String.p.matchAll
should be specced consistently with String.p.match
and String.p.search
. These are the only other 'regexp' methods defined on String
. Is there a reason to diverge from their behavior?
Likewise in RegExp.p[@@matchAll]
.
spec.emu
Outdated
1. Let _C_ be ? SpeciesConstructor(_R_, %RegExp%). | ||
1. Let _flags_ be ? ToString(? Get(_R_, `"flags"`)). | ||
1. Let _matcher_ be ? Construct(_C_, « _R_, _flags_ »). | ||
1. Let _global_ be ? ToBoolean(? Get(_matcher_, `"global"`)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@split
detects flag values of the resulting regexp by checking the string value of flags
. It would probably make sense to do the same here: 1. for consistency, 2. it would let us avoid two ToBoolean(Get)
sequences. See https://tc39.github.io/ecma262/#sec-regexp.prototype-@@split:
If flags contains "u", let unicodeMatching be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Instead of e.g.:
1. Let _global_ be ? ToBoolean(? Get(_matcher_, `"global"`)).
…we should do:
1. If _flags_ contains `"g"`, let _global_ be *true*.
1. Else, let _global_ be *false*.
…to follow the precedent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that seems reasonable. I'll do that directly in master since that's an observability improvement instead of API design questions.
This follows the convention used by `RegExp.prototype[@@split]`: https://tc39.github.io/ecma262/#sec-regexp.prototype-@@split
I agree with the others here who spoke up in favor of consistency. I don't see the need to be defensive of this particular case. I'm wondering, why were several comments marked as "outdated"? This marking makes it harder for me to read the thread. |
@littledan the one outdated comment thread is from a mistaken comment by me that justin corrected; nobody needs to read that thread. |
- `String.prototype.matchAll`: - use `RegExpCreate` when `Symbol.prototype.matchAll` is not found - fall back to regex coercion otherwise - `RegExp.prototype[Symbol.matchAll]`: - receiver is assumed to be a regex implicitly - remove `MatchAllIterator` abstract operation Thus, `IsRegExp` call no longer exists. Addresses #21. Addresses #34. Closes #37.
@anba outlined two separate concerns about the remaining
IsRegExp
call inMatchAllIterator
. Quoting from #34 (comment):When
MatchAllIterator
is called fromRegExp.prototype [ @@matchAll ]
, we should/have to assume the user explicitly decided to treatR
as a RegExp object, so having an additionalIsRegExp
call to change this decision seems questionable. It’s also not consistent with how the otherRegExp.prototype
methods work.When
MatchAllIterator
is called fromString.prototype.matchAll
, I’d prefer to handle it more like the otherString.prototype
methods which create RegExp objects (that meansString.prototype.match
andString.prototype.search
), because I want to avoid adding yet another way to handle RegExp sub-classes. There are already two different RegExp sub-classing/extension interfaces: theRegExp.prototype
methods all callRegExpExec
, which means sub-classes, or any other classes, only need to provide their ownexec
methods when they want to reuse the otherRegExp.prototype
methods. And in addition to that, the@@match/replace/search/split
interfaces allow sub-classes to provide their own implementations for just these methods. ThematchAll
proposal in its current form adds another dimension to this by providing different code paths depending on whether or not an object is RegExp-like (as per theIsRegExp
abstract operation). In my opinion we should only support RegExp sub-classing in two ways: 1) Either the RegExp sub-class has%RegExpPrototype%
on its prototype chain, or 2) the RegExp sub-class copies the relevant methods from%RegExpPrototype%
into its prototype object.This PR addresses those issues following @anba’s proposal.
Ref. #21, #34.