Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent various XSS attacks [rebase and update of #276] #495

Merged
merged 20 commits into from
Feb 28, 2018
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 101 additions & 14 deletions Parsedown.php
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,26 @@ function setUrlsLinked($urlsLinked)

protected $urlsLinked = true;

function setSafeLinksEnabled($safeLinksEnabled)
{
$this->safeLinksEnabled = $safeLinksEnabled;

return $this;
}

protected $safeLinksEnabled = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this shouldn't be enabled by default, it breaks BC. Enabling this without setMarkupEscaped(true) makes no sense anyway - it prevents absolutely nothing without escaped markup. Never confuse people by letting them do pointless things.

How about something like this (disallow enabling safeLinksEnabled without markupEscaped and always enable/disable safeLinksEnabled and markupEscaped at the same time; however, disabling safeLinksEnabled when markupEscaped is enabled is still possible - escaping markup doesn't necessarily imply a "safe mode", but in contrast, a "safe mode" isn't possible without escaping markup)?

    function setMarkupEscaped($markupEscaped)
    {
        $this->markupEscaped = (bool) $markupEscaped;
        $this->safeLinksEnabled = (bool) $markupEscaped;
        return $this;
   }

    function setSafeLinksEnabled($safeLinksEnabled)
    {
        // or throw a exception instead
        $this->safeLinksEnabled = $this->markupEscaped && $safeLinksEnabled;
        return $this;
    }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so some of this is spoken to by decisions made in the original PR.

I think the default was @erusev's suggestion as far as I can tell though, see #276 (comment)

If it was up to me I'd make markupEscaped default to on too 😉

I'm happy to go with the flow here though, so I'll await some more feedback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was up to me I'd make markupEscaped default to on too 😉

My 👎 for this, BC is IMO very important 😝

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, I mean as an initial decision (of course too late now! ;p)


protected $safeLinksWhitelist = array(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be an instance member? Seems like it can be moved to filterUnsafeUrlInAttribute

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory it would let an extension define additional protocols without having to worry about implementation of how to whitelist them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. you could fix a bug in the filter implementation that wouldn't have to be reflected in the extension if they're just adding data.

'http://',
'https://',
'ftp://',
'ftps://',
'mailto:',
'data:image/png;base64,',
'data:image/gif;base64,',
'data:image/jpeg;base64,',
);

#
# Lines
#
Expand Down Expand Up @@ -342,8 +362,6 @@ protected function blockCodeComplete($Block)
{
$text = $Block['element']['text']['text'];

$text = htmlspecialchars($text, ENT_NOQUOTES, 'UTF-8');

$Block['element']['text']['text'] = $text;

return $Block;
Expand Down Expand Up @@ -457,8 +475,6 @@ protected function blockFencedCodeComplete($Block)
{
$text = $Block['element']['text']['text'];

$text = htmlspecialchars($text, ENT_NOQUOTES, 'UTF-8');

$Block['element']['text']['text'] = $text;

return $Block;
Expand Down Expand Up @@ -515,10 +531,10 @@ protected function blockList($Line)
),
);

if($name === 'ol')
if($name === 'ol')
{
$listStart = stristr($matches[0], '.', true);

if($listStart !== '1')
{
$Block['element']['attributes'] = array('start' => $listStart);
Expand Down Expand Up @@ -1074,7 +1090,6 @@ protected function inlineCode($Excerpt)
if (preg_match('/^('.$marker.'+)[ ]*(.+?)[ ]*(?<!'.$marker.')\1(?!'.$marker.')/s', $Excerpt['text'], $matches))
{
$text = $matches[2];
$text = htmlspecialchars($text, ENT_NOQUOTES, 'UTF-8');
$text = preg_replace("/[ ]*\n/", ' ', $text);

return array(
Expand Down Expand Up @@ -1253,8 +1268,6 @@ protected function inlineLink($Excerpt)
$Element['attributes']['title'] = $Definition['title'];
}

$Element['attributes']['href'] = str_replace(array('&', '<'), array('&amp;', '&lt;'), $Element['attributes']['href']);

return array(
'extent' => $extent,
'element' => $Element,
Expand Down Expand Up @@ -1343,14 +1356,16 @@ protected function inlineUrl($Excerpt)

if (preg_match('/\bhttps?:[\/]{2}[^\s<]+\b\/*/ui', $Excerpt['context'], $matches, PREG_OFFSET_CAPTURE))
{
$url = $matches[0][0];

$Inline = array(
'extent' => strlen($matches[0][0]),
'position' => $matches[0][1],
'element' => array(
'name' => 'a',
'text' => $matches[0][0],
'text' => $url,
'attributes' => array(
'href' => $matches[0][0],
'href' => $url,
),
),
);
Expand All @@ -1363,7 +1378,7 @@ protected function inlineUrlTag($Excerpt)
{
if (strpos($Excerpt['text'], '>') !== false and preg_match('/^<(\w+:\/{2}[^ >]+)>/i', $Excerpt['text'], $matches))
{
$url = str_replace(array('&', '<'), array('&amp;', '&lt;'), $matches[1]);
$url = $matches[1];

return array(
'extent' => strlen($matches[0]),
Expand Down Expand Up @@ -1401,6 +1416,8 @@ protected function unmarkedText($text)

protected function element(array $Element)
{
$Element = $this->sanitiseElement($Element);

$markup = '<'.$Element['name'];

if (isset($Element['attributes']))
Expand All @@ -1412,7 +1429,7 @@ protected function element(array $Element)
continue;
}

$markup .= ' '.$name.'="'.$value.'"';
$markup .= ' '.$name.'="'.self::escape($value).'"';
}
}

Expand All @@ -1426,7 +1443,7 @@ protected function element(array $Element)
}
else
{
$markup .= $Element['text'];
$markup .= self::escape($Element['text'], true);
}

$markup .= '</'.$Element['name'].'>';
Expand Down Expand Up @@ -1485,10 +1502,80 @@ function parse($text)
return $markup;
}

protected function sanitiseElement(array $Element)
{
static $goodAttribute = '/^[a-zA-Z0-9][a-zA-Z0-9-_]*+$/';
static $safeUrlNameToAtt = array(
'a' => 'href',
'img' => 'src',
);

if (isset($safeUrlNameToAtt[$Element['name']]))
{
$Element = $this->filterUnsafeUrlInAttribute($Element, $safeUrlNameToAtt[$Element['name']]);
}

if ( ! empty($Element['attributes']))
{
foreach ($Element['attributes'] as $att => $val)
{
# filter out badly parsed attribute
if ( ! preg_match($goodAttribute, $att))
{
unset($Element['attributes'][$att]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was needed when I used array_flip in place of where array_keys is now, can remove that now though :)

}
# dump onevent attribute
elseif (preg_match('/^on/i', $att))
{
unset($Element['attributes'][$att]);
}
}
}

return $Element;
}

protected function filterUnsafeUrlInAttribute(array $Element, $attribute)
{
if ($this->safeLinksEnabled)
{
$safe = false;

foreach ($this->safeLinksWhitelist as $scheme)
{
if (stripos($Element['attributes'][$attribute], $scheme) === 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test this, but I think it treats relative links like [some link](relative/path/to/page) as unsafe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore: Don't use stripos() here, it forces PHP to walk through the whole string even when the first character already didn't match. Use substr() instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it'll treat that as unsafe. Problem with allowing relative/path/to/page will be that you'll additionally have to check that no colons are present (also note that &colon; is equivalent to a colon in a HTML attribute, and contains no colons).

Also see https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet#Embedded_newline_to_break_up_XSS for some examples of getting XSS in a HTML attribute via some obscenely permissive browser parsing :(

e.g. javas\0cript: and jav\tascript: are valid :(

Copy link
Collaborator Author

@aidantwoods aidantwoods May 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternate method for what's done here, one idea I've had is selectively partially url encoding attributes that can't be guaranteed safe by the whitelist.

I've written an implementation here. One thing I wasn't sure about was how permissive to be with which characters to allow though. I think holding back on encoding/#?&=% (so URLs will still function) (but definitely encoding ;, : and all other non-alphanumerics) should work. I'll have to have a think about it a bit more though as to whether you could bypass that and specify a protocol.

Copy link
Contributor

@PhrozenByte PhrozenByte May 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless are relative paths an important use case. It might be a good idea to look into the specs to find out how URI schemes must look like to be valid. Just guessing here, but I think the : as scheme "indicator" works only until a / is encountered. As an alternative: If safe url parsing isn't enabled by default, it might be sufficient to document this behavior and to tell people that they should use ./ for relative paths (what afaik isn't whitelisted yet).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not entirely convinced that this is best solution in every detail, but you've convinced me that the approach is right. There might still be some open detail questions, I unfortunately don't have time to look through everything in detail at the moment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we used your approach https://github.com/erusev/parsedown/pull/495/files/aee3963e6b97186b1e5526c118bf5d2d872cd8ee#r114438464 to make safeLinks be off by default, and strongly coupled safeLinks and setMarkupEscaped to be either both on, or both off?

That way custom protocols are supported for authors (who trust their own content), but will be off for user generated content (that they do not trust).

Perhaps we could ease the complication of having to enable two options in sync by introducing a new option that does both behind the scenes: preventXss or setContentUntrusted – that way we'd be giving a more explicit representation to the user about what this option actually does, and would afford us the assumption that while in this mode we should treat content as untrusted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one has nothing to do with the other, never ever introduce BC breaking changes without incrementing the first digit of the version (i.e. when this feature is enabled by default, it must be released as Parsedown 2.0). You'll break all dependent applications otherwise - and Parsedown is no end user project, it's a library, thus you'll break hundreds of applications with thousands of users. See http://semver.org. The only exception is behavior that never was expected (like some weird edge cases when parsing markup), neither officially, nor how it was practically used by others.

Copy link
Collaborator Author

@aidantwoods aidantwoods May 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never ever introduce BC breaking changes without incrementing the first digit of the version (i.e. when this feature is enabled by default, it must be released as Parsedown 2.0)

I'd tend to agree with that. In-fact even if we change no public default behaviour, IMO it would be valuable to consider how much of a version jump changes to the protected API would warrant (see: #495 (comment))

What do you think about introducing a new option though – in theory if we introduced something like preventXss or setContentUntrusted then there would be no BC breaking changes to the public API (we then wouldn't need to strongly couple safeLinks and setMarkupEscaped into a both or none scenario – perhaps we wouldn't even need to expose safeLinks publically, but just the new option).

Having this off by default would probably ease concerns about custom protocols (and as you said, having safeLinks enabled without setMarkupEscaped or vice versa doesn't make much sense anyway). So IMO either both should be off by default, or both should be on – both off seems most friendly to expected behaviour (even though my preference is that things are secure by default 😉)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is disabled by default - sure, why not.

However, as I've said before (#495 (comment)), please keep in mind that people might want to escape markup without having security/user-generated content in mind. Thus there might still be users who do want to escape markup, but don't want to have safeLinks enabled (i.e. what Parsedown currently does when markupEscaped is enabled). Just the other way round (safeLinks enabled without markupEscaped) makes no sense.

Nevertheless, this is no crucial question for me. Important is that this isn't enabled by default, as this would break BC in an absolute catastrophic way.

{
$safe = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply return immediately (return $Element;), there's nothing more to do here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, good catch!
226f636


break;
}
}

if ( ! $safe)
{
$Element['attributes'][$attribute] = preg_replace_callback(
Copy link
Owner

@erusev erusev May 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we find a simpler alternative to preg_replace_callback? Something that doesn't require passing a function definition as a parameter - perhaps preg_match.

Copy link
Collaborator Author

@aidantwoods aidantwoods May 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current way it's set out is to have a whitelist of characters we don't want to encode (and encode everything else).

I could achieve the same result with strpbrk and encoding done on ranges of the string, not sure that would be faster though. Unless you have another idea?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, didn't see your edit.

Could do it with preg_match_all with match positions returned and applying encoding to those ranges, again, not sure it'll be faster?

P.S. the regex will match as many chars as it can in one go before making the function call at present

Copy link
Collaborator Author

@aidantwoods aidantwoods May 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erusev this would be an alternative using preg_match_all

if (
    preg_match_all(
        '/[^\/#?&=%]++/',
        $Element['attributes'][$attribute],
        $matches,
        PREG_OFFSET_CAPTURE
    )
) {
    $offset = 0;

    foreach ($matches[0] as $match)
    {
        $len = strlen($match[0]);
        $encoded = urlencode($match[0]);

        $Element['attributes'][$attribute] = substr_replace(
            $Element['attributes'][$attribute],
            $encoded,
            $offset + $match[1],
            $len
        );

        $offset += strlen($encoded) - $len;
    }
}

More likely to introduce bugs with this though IMO, and I'm not sure it'll save that much on performance just looking at it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure preg_replace_callback() is the easiest and fastest solution here 😉

'/[^\/#?&=%]++/',
function (array $match)
{
return urlencode($match[0]);
},
$Element['attributes'][$attribute]
);
}
}

return $Element;
}

#
# Static Methods
#

protected static function escape($text, $allowQuotes = false)
{
return htmlspecialchars($text, $allowQuotes ? ENT_NOQUOTES : ENT_QUOTES, 'UTF-8');
}

static function instance($name = 'default')
{
if (isset(self::$instances[$name]))
Expand Down
2 changes: 2 additions & 0 deletions test/ParsedownTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ function test_($test, $dir)
$expectedMarkup = str_replace("\r\n", "\n", $expectedMarkup);
$expectedMarkup = str_replace("\r", "\n", $expectedMarkup);

$this->Parsedown->setMarkupEscaped($test === 'xss_text_encoding');

$actualMarkup = $this->Parsedown->text($markdown);

$this->assertEquals($expectedMarkup, $actualMarkup);
Expand Down
2 changes: 1 addition & 1 deletion test/data/inline_link.html
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<p><a href="http://example.com">link</a></p>
<p><a href="/url-(parentheses)">link</a> with parentheses in URL </p>
<p><a href="/url-%28parentheses%29">link</a> with parentheses in URL </p>
<p>(<a href="/index.php">link</a>) in parentheses</p>
<p><a href="http://example.com"><code>link</code></a></p>
<p><a href="http://example.com"><img src="http://parsedown.org/md.png" alt="MD Logo" /></a></p>
Expand Down
6 changes: 6 additions & 0 deletions test/data/xss_attribute_encoding.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<p><a href="https://www.example.com&quot;">xss</a></p>
<p><img src="https://www.example.com&quot;" alt="xss" /></p>
<p><a href="https://www.example.com&#039;">xss</a></p>
<p><img src="https://www.example.com&#039;" alt="xss" /></p>
<p><img src="https://www.example.com" alt="xss&quot;" /></p>
<p><img src="https://www.example.com" alt="xss&#039;" /></p>
11 changes: 11 additions & 0 deletions test/data/xss_attribute_encoding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[xss](https://www.example.com")

![xss](https://www.example.com")

[xss](https://www.example.com')

![xss](https://www.example.com')

![xss"](https://www.example.com)

![xss'](https://www.example.com)
16 changes: 16 additions & 0 deletions test/data/xss_bad_url.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<p><a href="javascript%3Aalert%281%29">xss</a></p>
<p><a href="javascript%3Aalert%281%29">xss</a></p>
<p><a href="javascript%3A//alert%281%29">xss</a></p>
<p><a href="javascript&amp;colon%3Balert%281%29">xss</a></p>
<p><img src="javascript%3Aalert%281%29" alt="xss" /></p>
<p><img src="javascript%3Aalert%281%29" alt="xss" /></p>
<p><img src="javascript%3A//alert%281%29" alt="xss" /></p>
<p><img src="javascript&amp;colon%3Balert%281%29" alt="xss" /></p>
<p><a href="data%3Atext/html%3Bbase64%2CPHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==">xss</a></p>
<p><a href="data%3Atext/html%3Bbase64%2CPHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==">xss</a></p>
<p><a href="data%3A//text/html%3Bbase64%2CPHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==">xss</a></p>
<p><a href="data&amp;colon%3Btext/html%3Bbase64%2CPHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==">xss</a></p>
<p><img src="data%3Atext/html%3Bbase64%2CPHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==" alt="xss" /></p>
<p><img src="data%3Atext/html%3Bbase64%2CPHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==" alt="xss" /></p>
<p><img src="data%3A//text/html%3Bbase64%2CPHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==" alt="xss" /></p>
<p><img src="data&amp;colon%3Btext/html%3Bbase64%2CPHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==" alt="xss" /></p>
31 changes: 31 additions & 0 deletions test/data/xss_bad_url.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
[xss](javascript:alert(1))

[xss]( javascript:alert(1))

[xss](javascript://alert(1))

[xss](javascript&colon;alert(1))

![xss](javascript:alert(1))

![xss]( javascript:alert(1))

![xss](javascript://alert(1))

![xss](javascript&colon;alert(1))

[xss](data:text/html;base64,PHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==)

[xss]( data:text/html;base64,PHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==)

[xss](data://text/html;base64,PHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==)

[xss](data&colon;text/html;base64,PHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==)

![xss](data:text/html;base64,PHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==)

![xss]( data:text/html;base64,PHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==)

![xss](data://text/html;base64,PHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==)

![xss](data&colon;text/html;base64,PHNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==)
7 changes: 7 additions & 0 deletions test/data/xss_text_encoding.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<p>&lt;script&gt;alert(1)&lt;/script&gt;</p>
<p>&lt;script&gt;</p>
<p>alert(1)</p>
<p>&lt;/script&gt;</p>
<p>&lt;script&gt;
alert(1)
&lt;/script&gt;</p>
12 changes: 12 additions & 0 deletions test/data/xss_text_encoding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<script>alert(1)</script>

<script>

alert(1)

</script>


<script>
alert(1)
</script>