Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up \uXXXX parsing and improve WTF-8 handling #1175

Merged
merged 5 commits into from
Aug 15, 2024

Conversation

purplesyringa
Copy link
Contributor

Altogether, this speeds up \u-encoded War and Peace parsing by 20%. Performance on json-benchmark is slightly affected: there are some 5% improvements and a -1% regression, but I'm willing to write that off as noise from an imperfect benchmark setup.

This PR should be more readable per-commit w/o whitespace changes. In addition to the above, it includes a variation on #877, since it's easier to implement with this design.

purplesyringa and others added 5 commits August 12, 2024 21:10
This counterintuitively speeds up War and Peace 275 -> 290 MB/s (+5%) by
enabling inlining of encode_utf8 and extend_from_slice.
This speeds up War and Peace 290 MB/s -> 330 MB/s (+15%).
This does not affect performance.
This does not affect performance.
Closes serde-rs#877.

This is a good time to make ByteBuf parsing more consistent as I'm
rewriting it anyway. This commit integrates the changes from serde-rs#877 and
also handles a leading surrogate followed by a surrogate pair correctly.

This does not affect performance significantly.

Co-authored-by: Luca Casonato <hello@lcas.dev>
Comment on lines +908 to +909
// XXX: This is actually a trailing surrogate.
return error(read, ErrorCode::LoneLeadingSurrogateInHexEscape);
Copy link
Contributor Author

@purplesyringa purplesyringa Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I do anything about this? This typo was present before the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would accept a followup PR to change the ErrorCode enum and fix the error message.

@purplesyringa purplesyringa changed the title Speed up \uXXXX parsing and other improvements Speed up \uXXXX parsing and improve WTF-8 handling Aug 12, 2024
Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@dtolnay dtolnay merged commit 0f942e5 into serde-rs:master Aug 15, 2024
13 checks passed
@purplesyringa purplesyringa deleted the faster-backslash-u branch August 18, 2024 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants