Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix additional < followed by characters and EOF issues (#728) #740

Merged
merged 1 commit into from
Oct 25, 2024

Conversation

willkg
Copy link
Member

@willkg willkg commented Oct 25, 2024

This fixes these two cases:

  • "<some thing thing" where "thing" is repeated twice which kicks up a parser error because it thinks it's a duplicated attribute
  • "<some thing thing2 " where the space at the end causes a expected-end-of-tag-but-got-eof parser error to pop up

In both of these cases, we want the data to be treated as character data--not a tag.

Python 3.10.14 (main, Aug 14 2024, 05:11:29) [Clang 18.1.8 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bleach
>>> bleach.clean("<test abc abc")
'&lt;test abc abc'
>>> bleach.clean("<test abc ")
'&lt;test abc '
>>> bleach.clean("asd<test abc ")
'asd&lt;test abc '
>>>

Fixes #728.

This fixes these two cases:

* "<some thing thing" where "thing" is repeated twice which kicks up a
  parser error because it thinks it's a duplicated attribute
* "<some thing thing2 " where the space at the end causes a
  expected-end-of-tag-but-got-eof parser error to pop up

In both of these cases, we want the data to be treated as character
data--not a tag.
@willkg
Copy link
Member Author

willkg commented Oct 25, 2024

self-r+

@willkg willkg merged commit 32efc26 into main Oct 25, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Open bracket '<' still cleaned up without closing bracket
1 participant