Speed up \uXXXX parsing and improve WTF-8 handling #1175

purplesyringa · 2024-08-12T18:26:05Z

Altogether, this speeds up \u-encoded War and Peace parsing by 20%. Performance on json-benchmark is slightly affected: there are some 5% improvements and a -1% regression, but I'm willing to write that off as noise from an imperfect benchmark setup.

This PR should be more readable per-commit w/o whitespace changes. In addition to the above, it includes a variation on #877, since it's easier to implement with this design.

This counterintuitively speeds up War and Peace 275 -> 290 MB/s (+5%) by enabling inlining of encode_utf8 and extend_from_slice.

This speeds up War and Peace 290 MB/s -> 330 MB/s (+15%).

This does not affect performance.

Closes serde-rs#877. This is a good time to make ByteBuf parsing more consistent as I'm rewriting it anyway. This commit integrates the changes from serde-rs#877 and also handles a leading surrogate followed by a surrogate pair correctly. This does not affect performance significantly. Co-authored-by: Luca Casonato <hello@lcas.dev>

purplesyringa · 2024-08-12T18:26:30Z

src/read.rs

+        // XXX: This is actually a trailing surrogate.
+        return error(read, ErrorCode::LoneLeadingSurrogateInHexEscape);


Should I do anything about this? This typo was present before the PR.

I would accept a followup PR to change the ErrorCode enum and fix the error message.

dtolnay

Thank you!

purplesyringa and others added 5 commits August 12, 2024 21:10

Mark \u parsing as cold

a38dbf3

This counterintuitively speeds up War and Peace 275 -> 290 MB/s (+5%) by enabling inlining of encode_utf8 and extend_from_slice.

Format UTF-8 strings manually

0e90b61

This speeds up War and Peace 290 MB/s -> 330 MB/s (+15%).

Use the same UTF-8/WTF-8 impl for surrogates

2f28d10

This does not affect performance.

Simplify unicode escape handling

236cc82

This does not affect performance.

purplesyringa commented Aug 12, 2024

View reviewed changes

purplesyringa changed the title ~~Speed up \uXXXX parsing and other improvements~~ Speed up \uXXXX parsing and improve WTF-8 handling Aug 12, 2024

dtolnay approved these changes Aug 15, 2024

View reviewed changes

dtolnay merged commit 0f942e5 into serde-rs:master Aug 15, 2024
13 checks passed

purplesyringa deleted the faster-backslash-u branch August 18, 2024 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up \uXXXX parsing and improve WTF-8 handling #1175

Speed up \uXXXX parsing and improve WTF-8 handling #1175

purplesyringa commented Aug 12, 2024

purplesyringa Aug 12, 2024 •

edited

Loading

dtolnay Aug 15, 2024

dtolnay left a comment

		// XXX: This is actually a trailing surrogate.
		return error(read, ErrorCode::LoneLeadingSurrogateInHexEscape);

Speed up \uXXXX parsing and improve WTF-8 handling #1175

Speed up \uXXXX parsing and improve WTF-8 handling #1175

Conversation

purplesyringa commented Aug 12, 2024

purplesyringa Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

dtolnay Aug 15, 2024

Choose a reason for hiding this comment

dtolnay left a comment

Choose a reason for hiding this comment

purplesyringa Aug 12, 2024 •

edited

Loading