Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix unescape entities #123

Merged
merged 5 commits into from
Sep 22, 2018
Merged

Bugfix unescape entities #123

merged 5 commits into from
Sep 22, 2018

Conversation

sonnyp
Copy link
Member

@sonnyp sonnyp commented Sep 15, 2018

thanks and credit @mogsie

rebase of #119

@sonnyp sonnyp mentioned this pull request Sep 15, 2018
@sonnyp sonnyp closed this Sep 20, 2018
@sonnyp sonnyp reopened this Sep 20, 2018
@sonnyp sonnyp force-pushed the bugfix-unescape-entities2 branch from 6bc5f05 to 2bdae01 Compare September 22, 2018 11:12
mogsie and others added 5 commits September 22, 2018 13:14
Character references are defined here:
  https://www.w3.org/TR/xml/#dt-charref

    CharRef   ::=     '&#' [0-9]+ ';'
                      | '&#x' [0-9a-fA-F]+ ';'

This test includes the use of the snake emoji U+1F40D, which is
encoded as two characters in UTF-16 (used by JavaScript).
This allows unescapeXML to correctly parse strings like @ (@) and
complex sequences like 🐍 (U+1F40D, Snake).
The normal numeric entity escaping now takes care of these characters.
When a character reference is outside the bounds defined by XML v1.0,
throw an error, seeing as the characters are not legal, and the
document is invalid.

Not doing this would probably open up attack vectors allowing the
entry of binary data and control sequences into data structures.
@sonnyp sonnyp merged commit d4844fa into master Sep 22, 2018
@sonnyp sonnyp deleted the bugfix-unescape-entities2 branch September 22, 2018 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants