Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Inconsistencies to RegExp.prototype[@@match] #28

Closed
schuay opened this issue Oct 11, 2017 · 5 comments
Closed

Inconsistencies to RegExp.prototype[@@match] #28

schuay opened this issue Oct 11, 2017 · 5 comments

Comments

@schuay
Copy link

schuay commented Oct 11, 2017

I'm a bit unclear on how closely matchAll should match the behavior of match. There's two (somewhat related) inconsistencies I noticed:

  1. Non-global regexp.

Given a non-global regexp, @@match will simply return Return ? RegExpExec(rx, S)., while matchAll returns a single-element iterator.

  1. Termination / zero-width matches.

matchAll handles termination through lastIndex comparisons and thus terminates on zero-width matches. @@match terminates on a failed match and advances lastIndex through AdvanceStringIndex on zero-width matches.

I suppose the termination method for matchAll was chosen to terminate correctly when passed a non-global regexp(?)

It feels strange to have these inconsistencies between such closely-related methods.

@ljharb
Copy link
Member

ljharb commented Oct 11, 2017

Symbol.match is special by necessity, since it's the sole symbol that makes IsRegExp pass. It wouldn't make sense to follow every spec step in it identically.

I'm not sure I understand your second point; can you provide me with a more concrete example? Indeed, the goal when passed a non-global regex was to provide an iterator for a single match, and RegExpExec used naively would provide (and did, in my initial polyfill implementation, oops) an infinite iterator for the same match object.

@schuay
Copy link
Author

schuay commented Oct 12, 2017

Symbol.match is special by necessity, since it's the sole symbol that makes IsRegExp pass. It wouldn't make sense to follow every spec step in it identically.

We may be talking past each other, I don't understand the connection to my point 1. AFAIK IsRegExp only needs @@match to be a truish value (and not even that for builtin RegExp objects, which have the [[RegExpMatcher]] internal slot). The semantics of @@match itself don't seem to have any relation to IsRegExp. Did I miss something?

I'm not sure I understand your second point; can you provide me with a more concrete example?

Sure. "abcde".match(/\B/g) ~~> ["", "", "", ""].

Now consider "abcde".matchAll(/\B/g). Each matched string has a length of 0, which means lastIndex remains unmodified after the first successful match. That would trigger termination (i.e. creating an IterResultObject with done = true) in next.

So, matchAll would return a single-element iterator while @@match would return a 4-element array.

@ljharb
Copy link
Member

ljharb commented Oct 12, 2017

You're totally right that IsRegExp only needs a non-truthy value; the other part tho is that all the built-in regex symbols are codifying legacy behavior - matchAll isn't necessarily constrained by those limitations.

With your specific example, I think you've definitely located an interesting case. An exec loop with that regex provides only a single match; and matchAll as currently specced actually uses exec, not match, under the hood - so the current algorithm intentionally provides an iterator for a single match object. (That said, .replace does invoke the callback function 4 times)

It's important here that the iterator is for match objects - arrays with an input and lastIndex property - so I'm not sure how to proceed here.

Unrelated to the spec steps, and generally speaking, would it match your intuition if [...s.matchAll(r)].length === s.match(r).length for global regexes? (so that your example would be an iterator for 4 match objects)

@schuay
Copy link
Author

schuay commented Oct 12, 2017

Unrelated to the spec steps, and generally speaking, would it match your intuition if [...s.matchAll(r)].length === s.match(r).length for global regexes?

Yeah, personally I think staying consistent with @@match and @@replace behavior, both of which use AdvanceStringIndex, makes sense.

@ljharb
Copy link
Member

ljharb commented Oct 12, 2017

I think that is both an improvement (smoothing over edge cases I hadn't previously thought of) and also more closely matches the goals of the proposal - and I think it can be achieved with a different spec algorithm.

Thanks for the help; I'll keep this open until I've finished that change.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants