Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poison 5.0 fails to decode Unicode surrogate pairs, Poison 4.0.1 succeeds #217

Closed
adworse opened this issue Oct 2, 2023 · 5 comments
Closed

Comments

@adworse
Copy link

adworse commented Oct 2, 2023

Reproduction:

Poison.decode!("{\"description\":\"\\uD83D\\uDD32\\uD83D\\uDD34\\uD83D\\uDD33\"}")
@wisq
Copy link

wisq commented Oct 7, 2023

Yeah, running into this as well.

Poison 5.0:

iex(1)> ~S{"\ud83d\udc69\ud83c\udffd\u200d\ud83d\udcbb"} |> Poison.decode()
{:error,
 %Poison.ParseError{
   data: "\"\\ud83d\\udc69\\ud83c\\udffd\\u200d\\ud83d\\udcbb\"",
   skip: 32,
   value: "\\ud83d"
 }}

Poison 4.0.1:

iex(1)> ~S{"\ud83d\udc69\ud83c\udffd\u200d\ud83d\udcbb"} |> Poison.decode()
{:ok, "👩🏽‍💻"}

@irisTa56
Copy link
Contributor

It seems related to a zero-width joiner between two surrogate pairs.
Note that I could reproduce @wisq's example, but couldn't reproduce @adworse's example.

Shorter examples I tried on Poison 5.0.0:

# with a zero-width joiner
iex(1)> Poison.decode(~S("\uD83D\uDC68\u200D\uD83D\uDC76"))
{:error,
 %Poison.ParseError{
   data: "\"\\uD83D\\uDC68\\u200D\\uD83D\\uDC76\"",
   skip: 20,
   value: "\\uD83D"
 }}
# without a zero-width joiner
iex(2)> Poison.decode(~S("\uD83D\uDC68\uD83D\uDC76"))
{:ok, "👨👶"}
# with a zero-width joiner but the following character is not a surrogate pair
iex(3)> Poison.decode(~S("\uD83D\uDC6E\u200D\u2642"))
{:ok, "👮‍♂"}

@devinus
Copy link
Owner

devinus commented Mar 5, 2024

All of these examples fail in the browser using JSON.parse other than the @adworse's original example which Poison 5.0 also correctly parses.

Poison 5.0 passes all spec tests, so I'm wary of allowing strings to parse that wont parse in a browser environment.

@wisq
Copy link

wisq commented Mar 5, 2024

Both Firefox and Chrome seem fine with my example.
Screenshot 2024-03-05 at 18 30 19
Screenshot 2024-03-05 at 18 30 50

@devinus
Copy link
Owner

devinus commented Mar 6, 2024

@wisq You're right, I must have somehow tested them wrong. Investigating this and @irisTa56's solution.

devinus added a commit that referenced this issue Jun 9, 2024
Features:

* Support Erlang 27 and Elixir 1.17
* Reintroduce `Poison.encode_to_iodata!/1` for Phoenix compatibility
* Make `:html_safe` encode option follow OWASP recommended HTML
  escaping
* Added `Date.Range` encoding
* Allow `:as` decode option to be a function
* Added a CHANGELOG

Bug Fixes:

* Stop double decoding structs
* Fix various typespecs
* Correctly encode some UTF-8 surrogate pairs

Performance Improvements:

* Significantly improved performance

Breaking Changes:

* Removed deprecated `HashSet` encoding
* Minimum supported versions are now Erlang 24 and Elixir 1.12

Closes #105, #172, #191, #194, #199, #206, #207, #214, #217, #222.
devinus added a commit that referenced this issue Jun 9, 2024
Features:

* Support Erlang 27 and Elixir 1.17
* Reintroduce `Poison.encode_to_iodata!/1` for Phoenix compatibility
* Make `:html_safe` encode option follow OWASP recommended HTML
  escaping
* Add `Date.Range` encoding
* Allow `:as` decode option to be a function
* Add a CHANGELOG

Bug Fixes:

* Stop double decoding structs
* Fix various typespecs
* Correctly encode some UTF-8 surrogate pairs

Performance Improvements:

* Significantly improve performance

Breaking Changes:

* Remove deprecated `HashSet` encoding
* Minimum supported versions are now Erlang 24 and Elixir 1.12

Closes #105, #172, #191, #194, #199, #206, #207, #214, #217, #222.
devinus added a commit that referenced this issue Jun 9, 2024
Features:

* Support Erlang 27 and Elixir 1.17
* Reintroduce `Poison.encode_to_iodata!/1` for Phoenix compatibility
* Make `:html_safe` encode option follow OWASP recommended HTML
  escaping
* Add `Date.Range` encoding
* Allow `:as` decode option to be a function
* Add a CHANGELOG

Bug Fixes:

* Stop double decoding structs
* Fix various typespecs
* Correctly encode some UTF-8 surrogate pairs

Performance Improvements:

* Significantly improve performance

Breaking Changes:

* Remove deprecated `HashSet` encoding
* Minimum supported versions are now Erlang 24 and Elixir 1.12

Closes #105, #172, #191, #194, #199, #206, #207, #214, #217, #222.
devinus added a commit that referenced this issue Jun 9, 2024
Features:

* Support Erlang 27 and Elixir 1.17
* Reintroduce `Poison.encode_to_iodata!/1` for Phoenix compatibility
* Make `:html_safe` encode option follow OWASP recommended HTML
  escaping
* Add `Date.Range` encoding
* Allow `:as` decode option to be a function
* Add a CHANGELOG

Bug Fixes:

* Stop double decoding structs
* Fix various typespecs
* Correctly encode some UTF-8 surrogate pairs

Performance Improvements:

* Significantly improve performance

Breaking Changes:

* Remove deprecated `HashSet` encoding
* Minimum supported versions are now Erlang 24 and Elixir 1.12

Closes #105, #172, #191, #194, #199, #206, #207, #214, #217, #222.
devinus added a commit that referenced this issue Jun 9, 2024
Features:

* Support Erlang 27 and Elixir 1.17
* Reintroduce `Poison.encode_to_iodata!/1` for Phoenix compatibility
* Make `:html_safe` encode option follow OWASP recommended HTML
  escaping
* Add `Date.Range` encoding
* Allow `:as` decode option to be a function
* Add a CHANGELOG

Bug Fixes:

* Stop double decoding structs
* Fix various typespecs
* Correctly encode some UTF-8 surrogate pairs

Performance Improvements:

* Significantly improve performance

Breaking Changes:

* Remove deprecated `HashSet` encoding
* Minimum supported versions are now Erlang 24 and Elixir 1.12

Closes #105, #172, #191, #194, #199, #206, #207, #214, #217, #222.
devinus added a commit that referenced this issue Jun 9, 2024
Features:

* Support Erlang 27 and Elixir 1.17
* Reintroduce `Poison.encode_to_iodata!/1` for Phoenix compatibility
* Make `:html_safe` encode option follow OWASP recommended HTML
  escaping
* Add `Date.Range` encoding
* Allow `:as` decode option to be a function
* Add a CHANGELOG

Bug Fixes:

* Stop double decoding structs
* Fix various typespecs
* Correctly encode some UTF-8 surrogate pairs

Performance Improvements:

* Significantly improve performance

Breaking Changes:

* Remove deprecated `HashSet` encoding
* Minimum supported versions are now Erlang 24 and Elixir 1.12

Closes #105, #172, #191, #194, #199, #206, #207, #214, #217, #222.
devinus added a commit that referenced this issue Jun 9, 2024
Features:

* Support Erlang 27 and Elixir 1.17
* Reintroduce `Poison.encode_to_iodata!/1` for Phoenix compatibility
* Make `:html_safe` encode option follow OWASP recommended HTML
  escaping
* Add `Date.Range` encoding
* Allow `:as` decode option to be a function
* Add a CHANGELOG

Bug Fixes:

* Stop double decoding structs
* Fix various typespecs
* Correctly encode some UTF-8 surrogate pairs

Performance Improvements:

* Significantly improve performance

Breaking Changes:

* Remove deprecated `HashSet` encoding
* Minimum supported versions are now Erlang 24 and Elixir 1.12

Closes #105, #172, #191, #194, #199, #206, #207, #214, #217, #222.
devinus added a commit that referenced this issue Jun 9, 2024
Features:

* Support Erlang 27 and Elixir 1.17
* Reintroduce `Poison.encode_to_iodata!/1` for Phoenix compatibility
* Make `:html_safe` encode option follow OWASP recommended HTML
  escaping
* Add `Date.Range` encoding
* Allow `:as` decode option to be a function
* Add a CHANGELOG

Bug Fixes:

* Stop double decoding structs
* Fix various typespecs
* Correctly encode some UTF-8 surrogate pairs

Performance Improvements:

* Significantly improve performance

Breaking Changes:

* Remove deprecated `HashSet` encoding
* Minimum supported versions are now Erlang 24 and Elixir 1.12

Closes #105, #172, #191, #194, #199, #206, #207, #214, #217, #222.
devinus added a commit that referenced this issue Jun 9, 2024
Features:

* Support Erlang 27 and Elixir 1.17
* Reintroduce `Poison.encode_to_iodata!/1` for Phoenix compatibility
* Make `:html_safe` encode option follow OWASP recommended HTML
  escaping
* Add `Date.Range` encoding
* Allow `:as` decode option to be a function
* Add a CHANGELOG

Bug Fixes:

* Stop double decoding structs
* Fix various typespecs
* Correctly encode some UTF-8 surrogate pairs

Performance Improvements:

* Significantly improve performance

Breaking Changes:

* Remove deprecated `HashSet` encoding
* Minimum supported versions are now Erlang 24 and Elixir 1.12

Closes #105, #172, #191, #194, #199, #206, #207, #214, #217, #222.
This was referenced Jun 9, 2024
@devinus devinus closed this as completed Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants