-
Notifications
You must be signed in to change notification settings - Fork 29.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util: simplify util.toUSVString #39891
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,10 +17,10 @@ const { | |
Promise, | ||
ReflectApply, | ||
ReflectConstruct, | ||
RegExpPrototypeExec, | ||
RegExpPrototypeTest, | ||
SafeMap, | ||
SafeSet, | ||
StringPrototypeSearch, | ||
StringPrototypeReplace, | ||
StringPrototypeToLowerCase, | ||
StringPrototypeToUpperCase, | ||
|
@@ -57,16 +57,13 @@ const experimentalWarnings = new SafeSet(); | |
|
||
const colorRegExp = /\u001b\[\d\d?m/g; // eslint-disable-line no-control-regex | ||
|
||
const unpairedSurrogateRe = | ||
/(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])/; | ||
const unpairedSurrogateRe = /\p{Surrogate}/u; | ||
function toUSVString(val) { | ||
const str = `${val}`; | ||
// As of V8 5.5, `str.search()` (and `unpairedSurrogateRe[@@search]()`) are | ||
// slower than `unpairedSurrogateRe.exec()`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an example of the meta-discussion we’re having. The problem with hardcoding micro-performance optimizations is that V8 keeps optimizing for idiomatic coding patterns over time, to the point where hand-written micro-optimizations often end up being slower than the idiomatic patterns. Unless there is some mechanism to track these hardcoded assumptions over time and adjust the code if necessary, short-term micro-optimizations end up hurting performance over the longer term. This comment about
(My colleague Peter Marshall once gave a nice presentation about this. Relevant segment: https://www.youtube.com/watch?v=1bZxs5J5n3M&t=708s) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree but in this specific case we are trading 50% performance for this change /(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])/ ↓ /\p{Surrogate}/u and using I understand that in a future V8 version I think we should strive for what is better now, not what might be better in future, otherwise things like this #39778 do no make any sense.
This is the key and the reason why I'm not blocking this (because there is no such mechanism). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
How so? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Imagine changes like this that are good in newer V8 versions but bad in older ones. Should they be backported to old but still suppported release lines? Should "dont-land-on" labels be used? Or should we simply not care? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand how that applies here — we shipped There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I did not make my point clear but I think you already answered my questions with "It doesn't matter if performance is worse in older V8 versions. All that matters is that the feature is supported". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about simply not backporting it? 🤔 |
||
const match = RegExpPrototypeExec(unpairedSurrogateRe, str); | ||
if (!match) | ||
const index = StringPrototypeSearch(str, unpairedSurrogateRe); | ||
if (index === -1) | ||
return str; | ||
return _toUSVString(str, match.index); | ||
return _toUSVString(str, index); | ||
} | ||
|
||
let uvBinding; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slower, see #39814 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. It depends on the string you use it with, just like the current solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summarizing that discussion: the performance characteristics are different before/after this change. The trade-off is as follows:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be pedandic:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I'm good with either approach but I'm happy to keep things as they are if folks think that
{Surrogate}
way is too slow.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you stick to the current approach, then consider renaming
unpairedSurrogateRe
— the current name is misleading, since it doesn’t just match unpaired surrogates. E.g. in the case of'\u{10000}\uDC00'
(i.e. a surrogate pair followed by a lone surrogate), it matches'\uDC00\uDC00'
(i.e. the trail surrogate of the pair + the lone surrogate) as opposed to just the lone surrogate'\uDC00'
— which is fine for this specific use case, but could become problematic if someone were to re-use this pattern elsewhere. For example, in a user-land implementation: