-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pattern /[^\D\P{Nd}]/utf matches to \x{1d7cf} #497
Comments
This is a very interesting case! There's something related here - which is the behaviour of They specifically changed the behaviour of They discussed this change here: tc39/proposal-regexp-v-flag#30 (Note - there's a lot of confusion and irrelevant chatter on that thread. Most of it just irrelevant.) How character classes work
With the
Things like |
Which \p{} is affected by case folding? It seemed to me (and I assumed), that classes always contains the other cases of all of its characters. Most properties are scripts or control characters. I suspect some extended script is affected. |
Well |
True. In PCRE2, /i does not affect properties. This way we don't need to generate that many databases. |
Note that is no longer the case since #432 |
Perl does this:
Which I think is right - PCRE2 is currently wrong. As the character is greater than 255 and UCP is not set, the bit map set up by \D is not relevant and only \P should count. However, if the first pattern is just [\P{Nd}] there is no match. So there is indeed a bug in PCRE2 and I think it is /[\D\P{Nd}]/utf that is incorrect. |
I'm not sure in #497 which one you think is wrong and have fixed... |
\D matches to anything that is not |
Ah yes, of course. I was forgetting that. OK, let's merge your patch. |
I made a random digression above, about the behaviour of PCRE2's behaviour matches Perl currently, and Python's So our behaviour is all good, but I think we're maybe missing tests for (negated, case-insensitive |
PCRE2's behaviour matches Perl currently, and Python's regex module (the standard re module doesn't have \P{...}). However, JavaScript behaviour changed recently (when they added the new 'v' flag; the old 'u' flag's behaviour is unchanged), so this case is worth additional testing. Raised in issue PCRE2Project#497.
I think this can be closed now? |
PCRE2's behaviour matches Perl currently, and Python's regex module (the standard re module doesn't have \P{...}). However, JavaScript behaviour changed recently (when they added the new 'v' flag; the old 'u' flag's behaviour is unchanged), so this case is worth additional testing. Raised in issue #497.
This test is in testinput5:
Currently this pattern matches to \x{1d7cf}
Since \D is in ascii mode (ucp is not enabled),
/[\D]/utf
matches to anything not 0-9. That should include \x{1d7cf}. This looks true:Note: /[\P{Nd}]/utf does not match to \x{1d7cf}
Summary: both
/[^\D\P{Nd}]/utf
and/[\D\P{Nd}]/utf
matches to \x{1d7cf}. Obviously a character class and its negated form cannot match to the same character, and I think the first one is incorrect. My newest code changes this pattern to no-match, and I wanted to discuss it.The text was updated successfully, but these errors were encountered: