Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util: expose toUSVString #39814
util: expose toUSVString #39814
Changes from all commits
e04a10b
f2ca87d
918491e
62c026f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch: #39891
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried both this and negative lookbehind but they are slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s currently slower for the one specific scenario you’re testing, yes. Change the test string to something like:
…and now
re3
is the second fastest:Change it to
…and now
re3
wins the race:It’s possible to construct a benchmark that shows any three options as the “fastest”, but I don’t think it’s very meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by a tiny margin and I think because the surrogate is at the beginning of the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, we should optimize for the case where there are no surrogates. We could even have false positives if that helps with performance since this is just an optimization to avoid calling into native code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A benchmark where
re3
is the fastest by 100%.I agree, no one will notice this in a real world application but switching to a slower option for a minor readability improvement seems a bit silly to me. The original regex is not that complex. Also, fwiw, I think
re2
is the most readable / easier to grok.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronag I tend to agree — that’s probably the common case. All three solutions perform similarly in this case:
Do we all agree that all other things equal,
\p{Surrogate}
is the simplest, most readable pattern? Because in that case, this discussion becomes a matter of choosing which specific use case we optimize for: the one from @lpinca’s benchmark (lone lead surrogate at the end of a short string) or the one from mine (lone trail surrogate at the start of a short string). I think both are edge cases, and I doubt one is more common than the other.@lpinca So you’d reject a readability improvement unless it also happens to double performance? Woah, that’s a high bar :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a 2x performance drop does not justify a minor readability improvement in my opinion. Anyway I'm not blocking #39891. Just wanted to report this because I tried with the negative lookbehind regex (
re2
) some time ago and decided to not open a PR because it was slower.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I would be happy with this readability improvement if performance was more or less the same for all inputs. This seems to be the case only for some inputs. Some inputs show a 100% performance drop, in those cases I would like to see a 100% improvement. I don't want it to be always 100% faster. If performance is the same ± 10% for all inputs, then that's great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we’d test lone lead surrogates, lone trail surrogates, and also surrogate pairs (to ensure those are unaffected). Patch: #39891