-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse \uXXXX escapes faster #1172
Conversation
Is changing error precision (see the changed test) okay? I couldn't sidestep that without sacrificing performance. |
The code previously pointed to a specific one of the expected hex digits: "\u0000\u00#0\u0000"
^ Preserving that is not important to me. The new code always points to the 6th byte after the backslash. "\u0000\u00#0\u0000"
^ Would it be costly to point to the current escape's backslash instead? This would make more sense and hopefully be as simple as offsetting the index by -6 in the error codepath. "\u0000\u00#0\u0000"
^ |
I can do that easily in the |
Hmm. Now that I think about it, I'm not sure if I understand how error location works. So e.g. here serde-json reports the error immediately after consuming malformed UTF-8: Lines 1080 to 1083 in cf771a0
And here we say "line 2" after consuming the erroneous Lines 1108 to 1111 in cf771a0
I think throwing up pointing after an invalid escape (i.e. the current behavior) would be more consistent. |
When ignoring *War and Peace* (in Russian), this increases performance from 640 MB/s to 1080 MB/s (+70%). When parsing into String, the savings are moderate but still significant: 275 MB/s to 320 MB/s (+15%).
e10a88a
to
86d0e11
Compare
Ideally the position returned by Sometimes there are codepaths where the byte which caused an error might have come from either Lines 39 to 57 in b4bc643
|
That would be fine. Something approximate would also be fine, like
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good!
I am interested in error position improvements (not just for \u
but the other ones you called out too) but that can happen separately if you are interested in looking into it.
Thanks! |
When ignoring War and Peace (in Russian), this increases performance from 640 MB/s to 1080 MB/s (+70%).
When parsing into String, the savings are moderate but still significant: 275 MB/s to 320 MB/s (+15%).