UTF-8 not handled while writing #16

laanwj · 2015-11-05T21:13:56Z

UTF-8 is not handled while writing strings in json_escape. The result of this is that unicode sequences like '\u1234' will be expanded to their UTF-8 equivalent on a roundtrip, and re-encoded as multiple unicode characters \u00XX\u00XX... which is not correct.

This could be handled:

If the input/output is UTF-8 encoded, UTF-8 sequences in strings can be passed through on output without any processing (according to RFC 4627 "All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).")

From RFC 4627:

JSON text SHALL be encoded in Unicode.  The default encoding is
UTF-8.

The text was updated successfully, but these errors were encountered:

apoelstra · 2015-11-16T18:48:13Z

To be consistent with https://github.com/apoelstra/strason the behaviour should be:

For the appropriate escape characters (\n, \r, \b, \f, ", ) use those. (It's allowable to escape / as well but I don't.)
For isprint ASCII characters, pass through.
For everything else, encode as UTF-16BE and use \uXXXX.

laanwj mentioned this issue Jan 29, 2016

Unicode comment strings doesn't handled correctly in "move" API calls to bitcoind. bitcoin/bitcoin#2127

Closed

laanwj mentioned this issue Apr 18, 2016

Handle UTF-8 #22

Merged

jgarzik closed this as completed in #22 Aug 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 not handled while writing #16

UTF-8 not handled while writing #16

laanwj commented Nov 5, 2015 •

edited

Loading

apoelstra commented Nov 16, 2015

UTF-8 not handled while writing #16

UTF-8 not handled while writing #16

Comments

laanwj commented Nov 5, 2015 • edited Loading

apoelstra commented Nov 16, 2015

laanwj commented Nov 5, 2015 •

edited

Loading