Antisamy 1.7.5 version - <body> tag issue #453

jeetu22 · 2024-05-24T06:05:25Z

The Antisamy library versions above 1.7.2 require a <body> tag in the HTML page; otherwise, it causes the HTML to break. Here's an example of the input HTML:

<html lang="en">
<head>
</head> 
<table class="container" cellpadding="0" cellspacing="0" align="center">    
<SELECT NAME="Lang">
<OPTION VALUE="da">Dansk</OPTION>
<OPTION VALUE="en" selected=selected>English</OPTION>
</SELECT>
</table>
</html>

The output produced is:

<html lang="en">
<head>
    <table class="container" cellpadding="0" cellspacing="0" align="center"></table>
    **<select name="Lang"></select>**
    <option value="da">Dansk</option>
    <option value="en" selected="selected">English</option>
</head>
</html>

As you can see, the <select> tag closes on the same line, causing the dropdown to malfunction and breaking the HTML page. This issue does not occur in Antisamy version 1.7.2 and earlier but appears in versions after 1.7.2. We are upgrading Antisamy in our project to version 1.7.5, but this issue is causing the complete HTML page to become distorted.

The text was updated successfully, but these errors were encountered:

rbri · 2024-05-24T07:01:38Z

Will have a look...

rbri · 2024-05-24T07:36:07Z

Had a quick look and have added a test case to neko. From this first look it seems like neko works ok. Maybe the tag is closed by some cleanup?

jeetu22 · 2024-05-24T07:50:07Z

Had a quick look and have added a test case to neko. From this first look it seems like neko works ok. Maybe the tag is closed by some cleanup?

can antisamy version 1.7.5 adds <body > tag if its missed or not added ?

rbri · 2024-05-24T08:13:37Z

@jeetu22 i do not know so much about the inner workings of antisamy but i'm responsible for the neko-htmlunit parser (https://github.com/HtmlUnit/htmlunit-neko) used by antisamy to parse the html file and convert it into a dom tree. During this process some cleanup is done to form a valid dom (or emit valid sax events).

And yes missing body (start) elements are added for from valid dom trees. Proving this for your case was exactly the reason to write the additional test case for the parser.

jeetu22 · 2024-05-24T08:33:22Z

@jeetu22 i do not know so much about the inner workings of antisamy but i'm responsible for the neko-htmlunit parser (https://github.com/HtmlUnit/htmlunit-neko) used by antisamy to parse the html file and convert it into a dom tree. During this process some cleanup is done to form a valid dom (or emit valid sax events).

And yes missing body (start) elements are added for from valid dom trees. Proving this for your case was exactly the reason to write the additional test case for the parser.

Thank you very much!!. as antisamy uses neko internally , anyone from Antisamy who can guide us in this scenario.i m suspecting HTMLScanner.java is modifying DOM

rbri · 2024-05-24T08:44:56Z

Maybe the org.owasp.validator.html.scan.MagicSAXFilter is the one - but only a guess.

jeetu22 · 2024-05-24T12:05:28Z

Maybe the org.owasp.validator.html.scan.MagicSAXFilter is the one - but only a guess.

@rbri

public void selectInsideEmptyTable() throws Exception {
       final String html = "<html><head></head><body>\n"
               + "<table><select name='Lang'><option value='da'>Dansk</option></select></table>\n"
               + "<script>\n"
               + LOG_TITLE_FUNCTION
               + "log(document.body.childNodes.length);\n"
               + "log(document.body.children.length);\n"
               + "log(document.body.children[0]);\n"
               + "log(document.body.children[1]);\n"
               + "log(document.body.children[2]);\n"
               + "</script>\n"
               + "</body></html>";

       expandExpectedAlertsVariables(URL_FIRST);

       loadPageVerifyTitle2(html);
   }

can you please remove body tag from this Junit test case and assert that output HTML should contains <body> tag.

davewichers · 2024-05-24T14:13:02Z

@spassarop - Can you look into this with @rbri?

rbri · 2024-05-26T09:47:08Z

i guess i found the reason - will analyze this a bit more

rbri · 2024-05-26T10:31:47Z

Ok, antisamy is using the fragment parser instead of the document parser; with the fragment parser i can reproduce the problem.
Will require some time to fix that.

spassarop · 2024-05-26T14:30:04Z

Thanks @rbri for being so proactive with this.

@jeetu22, even though @rbri seem to have reproduced the problem to debug, it would be useful if you provide how are you calling AntiSamy and what policy you are using. These factors make AntiSamy decide if it should use DOM or SAX parser, o which tags to preserve.

jeetu22 · 2024-05-27T07:02:53Z

Thanks @rbri for being so proactive with this.

@jeetu22, even though @rbri seem to have reproduced the problem to debug, it would be useful if you provide how are you calling AntiSamy and what policy you are using. These factors make AntiSamy decide if it should use DOM or SAX parser, o which tags to preserve.

we are using SAX parser.
parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment", true);

rbri · 2024-05-27T07:08:19Z

parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment", true);

@jeetu22, setting this feature for the parser changes the behavior in some ways. One of the effects is the one you are facing - the tag balancer no longer adds missing body tags. But there are some others also.

As promised i will have a look at all that - at the moment i'm thinking about why antisamy should use the fragment way of parsing at all. Because i'm working on all this in my spare time and i have some other private things on my todo list, please be a bit patient to do not see a fix in the next hours ;-)

jeetu22 · 2024-05-27T07:29:42Z

parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment", true);

@jeetu22, setting this feature for the parser changes the behavior in some ways. One of the effects is the one you are facing - the tag balancer no longer adds missing body tags. But there are some others also.

As promised i will have a look at all that - at the moment i'm thinking about why antisamy should use the fragment way of parsing at all. Because i'm working on all this in my spare time and i have some other private things on my todo list, please be a bit patient to do not see a fix in the next hours ;-)

Thank you for the update! I appreciate you looking into the issue.Given your busy schedule, I completely understand that a fix might take some time.

Please take the time you need, and I look forward to your findings.

Thanks again for your efforts!

spassarop · 2024-05-27T11:17:08Z

I don’t know too much about the SAX parser, so I have no idea about the difference nor why AntiSamy uses fragment parser. It could be changed and see how the tests react.

…

On Mon, 27 May 2024 at 04:30 Jitendra ***@***.***> wrote: parser.setFeature(" http://cyberneko.org/html/features/balance-tags/document-fragment", true); @jeetu22 <https://github.com/jeetu22>, setting this feature for the parser changes the behavior in some ways. One of the effects is the one you are facing - the tag balancer no longer adds missing body tags. But there are some others also. As promised i will have a look at all that - at the moment i'm thinking about why antisamy should use the fragment way of parsing at all. Because i'm working on all this in my spare time and i have some other private things on my todo list, please be a bit patient to do not see a fix in the next hours ;-) Thank you for the update! I appreciate you looking into the issue.Given your busy schedule, I completely understand that a fix might take some time. Please take the time you need, and I look forward to your findings. Thanks again for your efforts! — Reply to this email directly, view it on GitHub <#453 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHL3BMIMIZA76YXAHWLM7CDZELOHXAVCNFSM6AAAAABIG5VRYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZSHAZDIOJWGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

use 4.2.0-SNAPSHOT

rbri · 2024-05-27T12:13:27Z

After some thinking

i have an idea why the fragment parser is used - at least form the tests it looks like antisamy also can clean html snippets, not only complete html pages
i have improved the fragment parser in a way that the parser now takes care of an existing html tag - if this was passed before the automatic generation of head and body tags is enabled also in the fragment mode
have added the issue as test case

kwwall · 2024-05-27T14:59:02Z

@rbri wrote:

i have an idea why the fragment parser is used - at least form the tests it looks like antisamy also can clean html snippets, not only complete html pages

That is exactly why! In fact, I think that is the most common use case for HTML sanitizers in general. There's generally some user input that you might capture that only allows some specific mark-up (and which mark up may be vary from one use to another) and you want to sanitize that to make it safe to use it in a broader context of an application generated page. I think it's rare that AntiSamy or the OWASP HTML Sanitizer project would get a complete HTML page to sanitize. That's certainly a valid use case too, but just not one that is as common. If AntiSamy ditched the fragment parser, then I think that ESAPI would have to ditch AntiSamy because dealing with HTML fragments is what Validator.getValidSafeHTML is generally expecting.

spassarop · 2024-05-27T15:24:34Z

Oh right, of course. I didn’t know what fragment parser meant initially.

…

On Mon, 27 May 2024 at 11:59 Kevin W. Wall ***@***.***> wrote: @rbri <https://github.com/rbri> wrote: - i have an idea why the fragment parser is used - at least form the tests it looks like antisamy also can clean html snippets, not only complete html pages That is exactly why! In fact, I think that is the most common use case for HTML sanitizers in general. There's generally some user input that you might capture that only allows some specific mark-up (and which mark up may be vary from one use to another) and you want to sanitize that to make it safe to use it in a broader context of an application generated page. I think it's rare that AntiSamy or the OWASP HTML Sanitizer project would get a complete HTML page to sanitize. That's certainly a valid use case too, but just not one that is as common. If AntiSamy ditched the fragment parser, then I think that ESAPI would have to ditch AntiSamy because dealing with HTML fragments is what Validator.getValidSafeHTML <https://javadoc.io/static/org.owasp.esapi/esapi/2.5.3.1/org/owasp/esapi/Validator.html#getValidSafeHTML-java.lang.String-java.lang.String-int-boolean-org.owasp.esapi.ValidationErrorList-> is generally expecting. — Reply to this email directly, view it on GitHub <#453 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHL3BMMSMUEIHHRT77J737DZENC4XAVCNFSM6AAAAABIG5VRYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZTGY2TCOJZGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

rbri · 2024-06-01T09:11:06Z

@davewichers @spassarop fix is ready in PR #454

davewichers · 2024-06-01T18:59:44Z

@spassarop - can you create test case for this situation that fails, and then verify that it now passes with his snapshot version?

rbri · 2024-06-01T22:07:26Z

My PR includes such an test case...

spassarop · 2024-06-01T22:21:56Z

Hahaha yeah, our man here is one step ahead ;)

rbri · 2024-06-04T17:27:30Z

neko 4.2.0 released

jeetu22 · 2024-06-05T06:45:44Z

Ok, antisamy is using the fragment parser instead of the document parser; with the fragment parser i can reproduce the problem. Will require some time to fix that.

@rbri , can you confirm if the Neko 4.2.0 release resolves the above issue?

rbri · 2024-06-05T07:54:29Z

@jeetu22 thats the goal - but i guess there will be a new release of antisamy itself soon (see #454 for more details)

jeetu22 · 2024-06-05T07:57:31Z

@jeetu22 thats the goal - but i guess there will be a new release of antisamy itself soon (see #454 for more details)

i tried with Antisamy:1.7.5 and neko-4.2.0 many testcases are failing in AntisamyTest.java
one such example:

rbri · 2024-06-05T08:40:34Z

@jeetu22 strange - have done this right now

checkout the current code of the antisamy project
change the version of neko in the pom
run 'mvn clean test'

rbri · 2024-06-05T08:45:07Z

for me this looks like you still have an old version of neko somewhere in your class path... can you please provide the whole stack trace...

* add test for issue #453 use 4.2.0-SNAPSHOT * code style * add neko-htmlunit snapshot repo * use neko 4.2.0 release * neko-htmlunit version 4.2.1 * remove property

jeetu22 · 2024-06-13T06:52:45Z

checking , will update you @rbri .

jeetu22 · 2024-07-03T12:21:08Z

Hi @rbri,

We've thoroughly tested Antisamy 1.7.6-SNAPSHOT and found that both the workflow and UI are working fine. It would be great if you could provide a tentative release date for the non-snapshot version.

Thank you!

davewichers · 2024-07-03T14:26:13Z

@rbri - we are working on that right now. We are trying to figure out how to address issue #456, which is turning out a bit more complicated than we thought. As soon as we get this addressed, we'll do a release. Hopefully in the next week or so.

rbri · 2024-07-03T16:42:06Z

In between neko 4.3.0 is out (https://github.com/HtmlUnit/htmlunit-neko/releases) - you should be able to safely switch to this one.

rbri · 2024-07-03T16:44:36Z

oh you are already at 4.3.0...

davewichers · 2024-07-07T15:41:25Z

This was fixed in release 1.7.6 which went out yesterday.

jeetu22 changed the title ~~Antisamy 1.7.5 version - <body> tag issue~~ Antisamy 1.7.5 version - issue May 24, 2024

jeetu22 changed the title ~~Antisamy 1.7.5 version - issue~~ Antisamy 1.7.5 version - <body> tag issue May 24, 2024

rbri added a commit to HtmlUnit/htmlunit-neko that referenced this issue May 24, 2024

test case for nahsra/antisamy#453

56f386f

rbri added a commit to HtmlUnit/htmlunit-neko that referenced this issue May 24, 2024

test case for nahsra/antisamy#453

b82a101

rbri added a commit to HtmlUnit/htmlunit that referenced this issue May 24, 2024

test case for nahsra/antisamy#453

ab02e8d

rbri added a commit to rbri/antisamy that referenced this issue May 27, 2024

add test for issue nahsra#453

2eee0b3

use 4.2.0-SNAPSHOT

kwwall mentioned this issue Jun 3, 2024

add test for issue #453 - fix is done in neko #454

Merged

davewichers pushed a commit that referenced this issue Jun 6, 2024

add test for issue #453 - fix is done in neko (#454)

e847d50

* add test for issue #453 use 4.2.0-SNAPSHOT * code style * add neko-htmlunit snapshot repo * use neko 4.2.0 release * neko-htmlunit version 4.2.1 * remove property

davewichers closed this as completed Jul 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antisamy 1.7.5 version - <body> tag issue #453

Antisamy 1.7.5 version - <body> tag issue #453

jeetu22 commented May 24, 2024 •

edited

Loading

rbri commented May 24, 2024

rbri commented May 24, 2024

jeetu22 commented May 24, 2024

rbri commented May 24, 2024

jeetu22 commented May 24, 2024

rbri commented May 24, 2024

jeetu22 commented May 24, 2024

davewichers commented May 24, 2024

rbri commented May 26, 2024

rbri commented May 26, 2024

spassarop commented May 26, 2024

jeetu22 commented May 27, 2024

rbri commented May 27, 2024

jeetu22 commented May 27, 2024

spassarop commented May 27, 2024 via email

rbri commented May 27, 2024 •

edited

Loading

kwwall commented May 27, 2024

spassarop commented May 27, 2024 via email

rbri commented Jun 1, 2024

davewichers commented Jun 1, 2024

rbri commented Jun 1, 2024

spassarop commented Jun 1, 2024

rbri commented Jun 4, 2024

jeetu22 commented Jun 5, 2024

rbri commented Jun 5, 2024

jeetu22 commented Jun 5, 2024 •

edited

Loading

rbri commented Jun 5, 2024

rbri commented Jun 5, 2024 •

edited

Loading

jeetu22 commented Jun 13, 2024

jeetu22 commented Jul 3, 2024 •

edited

Loading

davewichers commented Jul 3, 2024

rbri commented Jul 3, 2024

rbri commented Jul 3, 2024

davewichers commented Jul 7, 2024

Antisamy 1.7.5 version - <body> tag issue #453

Antisamy 1.7.5 version - <body> tag issue #453

Comments

jeetu22 commented May 24, 2024 • edited Loading

rbri commented May 24, 2024

rbri commented May 24, 2024

jeetu22 commented May 24, 2024

rbri commented May 24, 2024

jeetu22 commented May 24, 2024

rbri commented May 24, 2024

jeetu22 commented May 24, 2024

davewichers commented May 24, 2024

rbri commented May 26, 2024

rbri commented May 26, 2024

spassarop commented May 26, 2024

jeetu22 commented May 27, 2024

rbri commented May 27, 2024

jeetu22 commented May 27, 2024

spassarop commented May 27, 2024 via email

rbri commented May 27, 2024 • edited Loading

kwwall commented May 27, 2024

spassarop commented May 27, 2024 via email

rbri commented Jun 1, 2024

davewichers commented Jun 1, 2024

rbri commented Jun 1, 2024

spassarop commented Jun 1, 2024

rbri commented Jun 4, 2024

jeetu22 commented Jun 5, 2024

rbri commented Jun 5, 2024

jeetu22 commented Jun 5, 2024 • edited Loading

rbri commented Jun 5, 2024

rbri commented Jun 5, 2024 • edited Loading

jeetu22 commented Jun 13, 2024

jeetu22 commented Jul 3, 2024 • edited Loading

davewichers commented Jul 3, 2024

rbri commented Jul 3, 2024

rbri commented Jul 3, 2024

davewichers commented Jul 7, 2024

jeetu22 commented May 24, 2024 •

edited

Loading

rbri commented May 27, 2024 •

edited

Loading

jeetu22 commented Jun 5, 2024 •

edited

Loading

rbri commented Jun 5, 2024 •

edited

Loading

jeetu22 commented Jul 3, 2024 •

edited

Loading