-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up \uXXXX parsing and improve WTF-8 handling #1175
Conversation
This counterintuitively speeds up War and Peace 275 -> 290 MB/s (+5%) by enabling inlining of encode_utf8 and extend_from_slice.
This speeds up War and Peace 290 MB/s -> 330 MB/s (+15%).
This does not affect performance.
This does not affect performance.
Closes serde-rs#877. This is a good time to make ByteBuf parsing more consistent as I'm rewriting it anyway. This commit integrates the changes from serde-rs#877 and also handles a leading surrogate followed by a surrogate pair correctly. This does not affect performance significantly. Co-authored-by: Luca Casonato <hello@lcas.dev>
// XXX: This is actually a trailing surrogate. | ||
return error(read, ErrorCode::LoneLeadingSurrogateInHexEscape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I do anything about this? This typo was present before the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would accept a followup PR to change the ErrorCode enum and fix the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Altogether, this speeds up \u-encoded War and Peace parsing by 20%. Performance on json-benchmark is slightly affected: there are some 5% improvements and a -1% regression, but I'm willing to write that off as noise from an imperfect benchmark setup.
This PR should be more readable per-commit w/o whitespace changes. In addition to the above, it includes a variation on #877, since it's easier to implement with this design.