-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
folly: utf8ToCodePoint: enforce max valid code point is U+10FFFF - re…
…turn U+FFFD / throw for well-formed UTF-8 encoded values that are larger than the max code point Summary: UTF-8 can encode large numbers, but Unicode code points are only defined up to `U+10FFFF`. For example: - the 4B UTF-8 encoding `"\xF6\x8D\x9B\ xBC"` (bits: `11110110 10001101 10011011 10111100`) is a valid UTF-8 encoding - but the encoded value is `U+18D6 (https://github.com/facebook/folly/commit/d40182262d41679cab28f6be7366cc5ff901683b)FC` which is larger than `U+10FFFF` With `opts.skip_invalid_utf8 = true;` `json::serialize` should have returned `"\ufffd"` since it the input is invalid, but due to a bug in `utf8ToCodePoint` it returned the incorrect `"\"\xF6\x8D\x9B\xBC\""`. Update `utf8ToCodePoint` to also reject 4 byte UTF-8 encoded values larger than the max Unicode code point (`U+10FFFF`). Reviewed By: luciang Differential Revision: D25275722 fbshipit-source-id: e7daeea834a0c5323923a5451a2565ceff5e4734
- Loading branch information
1 parent
67f20f2
commit ae03ef8
Showing
3 changed files
with
55 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters