security: finish fixing unsafe heading regex #1226

davisjam · 2018-04-16T22:50:03Z

Apply similar patch to a similar heading regex.
Follow-on to f052a2c (#1224).

Test: Add a test case to demonstrate the slower blow-up.

UziTech · 2018-04-17T04:50:37Z

Looks like the tests are failing because the new test takes more than a second to pass. Is the test supposed to be that slow?

styfle · 2018-04-17T13:04:34Z

Test: Add a test case to demonstrate the slower blow-up.

You successfully demonstrated the blow-up 😃
What was the time to run the test before the change?

davisjam · 2018-04-17T13:24:28Z

It was "worse" before the patch. I didn't measure carefully.

I know I say this a lot, but I can't see a good solution for these without (1) weakening the spec, or (2) writing a parser.

UziTech · 2018-04-17T13:46:13Z

We can do some parsing in Lexer.prototype.token

marked/lib/marked.js

Lines 235 to 244 in 4711f6b

    
           // heading 
        
           if (cap = this.rules.heading.exec(src)) { 
        
             src = src.substring(cap[0].length); 
        
             this.tokens.push({ 
        
               type: 'heading', 
        
               depth: cap[1].length, 
        
               text: cap[2] 
        
             }); 
        
             continue; 
        
           }

davisjam · 2018-04-17T13:47:48Z

@UziTech Acknowledged, in the style of #1227?

If so please clarify the pedantic thing I asked about in the comments on #1227.

UziTech · 2018-04-17T14:04:40Z

Ya similar to #1227.

changing ' *([^\n]+?) *' to '(.+?)' and trimming at text: cap[2].trim() seems to speed it up considerably

davisjam · 2018-04-17T17:24:21Z

OK. I will give this a try. Thanks for the pointer.

davisjam · 2018-04-18T02:23:20Z

901b222 makes the regex more generous and applies parsing during tokenizing.

The test case still fails but due to another regex now. Progress!

UziTech · 2018-04-18T02:28:05Z

lib/marked.js

@@ -16,7 +16,8 @@ var block = {
  code: /^( {4}[^\n]+\n*)+/,
  fences: noop,
  hr: /^ {0,3}((?:- *){3,}|(?:_ *){3,}|(?:\* *){3,})(?:\n+|$)/,
-  heading: /^ *(#{1,6}) *([^\n]+?) *(?:#+ *)?(?:\n+|$)/,
+  // cap[2] might be ' HEADING # ' and must be trimmed appropriately.
+  heading: /^ *(#{1,6})([^\n]*)(?:\n+|$)/,


could we replace [^\n] with .?

Worth noting that this regex is substituted into block.paragraph and the follow-up parsing will not be performed there. I think we're OK spec-wise but might want to double-check.

davisjam · 2018-04-18T03:30:43Z

FWIW Here's a Node-only way to deal with all REDOS issues:

const vm = require('vm');
const util = require('util');

const marked = require('./lib/marked.js');

const myContext = {
  marked: (text) => {
    return marked(text);
  }
};

function safeMarked(md) {
  let result;
  try {
    result = vm.runInNewContext(`marked("${md}")`, myContext, {
      timeout: 20 // ms
    });
  } catch (e) {
    util.log(e);
    result = 'TIMEOUT';
  }

  return result;
}

Any interest in offering an API along these lines (with WebWorker equivalent in browser)?
Or save for a separate module?

I can run some performance numbers if interested.

styfle · 2018-04-18T11:16:05Z

@davisjam I would say a new module. In fact, this is small enough to just be in the docs as BEST_PRACTICES.md

joshbruce · 2018-04-18T14:33:50Z

I still think security should trump other visions for the product. We could bench against a standard use case for a solution. Then, for the security (mostly DOS) side of things, let it run longer because the chances are slim that someone is going to hit that.

This gives us an A-B test. A for the most likely case and B for the edge security-vulnerable case.

Would like to put this in 0.4.0 as it solves a security issue. We can iterate on it later as we focus on the other missions:

Security
Spec-compliance
Speed
Low-level
Lightweight

We've demonstrated multiple times that Marked isn't as fast as it could be and no longer faster than other compilers.

UziTech · 2018-04-19T02:26:34Z

@davisjam I submitted a PR to this PR that fixes all of the CM ATX header examples except one: davisjam#1

the example that still fails is because of whitespace so I'm not sure exactly how to fix that one yet:

CommonMark (ATX headings):
foo
    # bar

------

Expected:
<p>foo
# bar</p>

------

Marked:
<p>foo
    # bar</p>

davisjam · 2018-04-25T13:30:33Z

I'll take a look soon.

davisjam · 2018-04-26T00:16:41Z

@UziTech How much do you hate the rtrim I added in 0cfe39e?

davisjam · 2018-04-26T00:18:11Z

@UziTech In your PR you said one case was still failing. When I run npm run test all the cases pass. What am I missing?

Apply similar patch to a similar heading regex. Follow-on to f052a2c. Test: Add a test case to demonstrate the slower blow-up.

Problem: replace(/X+$/, '') is vulnerable to REDOS Solution: Replace all instances I could find with a custom rtrim

styfle · 2018-04-26T01:25:22Z

lib/marked.js

@@ -4,6 +4,38 @@
 * https://github.com/markedjs/marked
 */

+// Return str with all trailing {c | all but c} removed
+// allButC: Default false
+function rtrim(str, c, allButC) {


Should there be unit tests for this function?

We would need to export the function. I think that would constitute testing the implementation.

I agree with @UziTech.

Spec: No leading whitespace within a paragraph Fix: I strip leading whitespace on each line while rendering paragraphs Unanticipated problem: Causes the (previously failing but hidden) CommonMark 318 to fail. I added that to the list of expected-to-fail tests. This commit addresses a failing header test case noted in 943d995 whose root cause was paragraph rendering not up to spec.

Sketch implementing text regex as a linear-time RegExp imitator. - A few nits here and there - I haven't tested all of the offsetOfX routines, so 'npm run test' hangs on some bug

I'll review when this is no longer a WIP

davisjam · 2018-04-30T14:06:44Z

Looking for feedback on the approach in 24d4a5e. It's a WIP (an off-by-one error somewhere) but I want to know whether you guys like the direction.

UziTech · 2018-04-30T15:34:22Z

One of the benefits of having the regexes do most of the work is that it makes marked easy to extend.

If I want to use marked but change the way text is parsed I can with:

marked.InlineLexer.rules.text = /some other regex/;

If we are going to start moving away from just regexes we need to figure out how to keep marked easily extendable.

UziTech · 2018-04-30T15:57:16Z

On a security note. I also think we need to distinguish ReDoS attacks between "Can be triggered by a malicious actor" and "Can be triggered accidentally in a normal context".

I don't think it is our job to prevent a malicious actor from causing marked from taking a long time to parse the input. We should educate our dependents on the safe way to deal with parsing user input. (i.e. web worker/vm.runInNewContext)

If one of our dependents allows users to upload a 1 Terabyte markdown file there is nothing we can do to prevent marked from going slow.
If a user must add 1000 spaces after character in order for marked to take 1 second to parse, is it really necessary to prevent that? (unless there might be some reason a normal person would add 1000 spaces after that character)

davisjam · 2018-04-30T16:22:13Z

@UziTech Solid points.

Extendability

Agreed. Since I'm retaining the RegExp.exec interface, the existing extendability mechanism would continue to work.

Security

Pursuing the direction you advocate would mean documenting a threat model for marked. This would tell users and security researchers what we consider a threat and what we consider "not our problem". But it's difficult to say what might make a "normal person" add 1000 spaces here or there, and it's also difficult to estimate the number of spaces necessary for a noticeable lag --- mobile vs. microservices vs. VMs vs. bare metal vs. ...

But personally I think saying "use module at your own risk, see threat model" is just another way of saying "don't use this module". The input will be untrusted in many cases, since many markdown use cases involve user-specified input being rendered on someone else's machine (e.g. GitHub and StackOverflow both let you enter Markdown in comment sections). It would make more sense to me to expose a safe API (perhaps with an unsafe faster version). Such an API would solve all of these REDOS problems in one shot and let us focus our energies elsewhere.

UziTech · 2018-04-30T16:46:11Z

(perhaps with an unsafe faster version)

See, I think marked is that unsafe faster version.

There are many other (safer) markdown parsing libraries. marked's "claim to fame" is it's speed and compliance with the original (pedantic) test suite

davisjam · 2018-04-30T17:01:06Z

marked's "claim to fame" is it's speed

Do we know how true this is anymore?

There are many markdown libraries, but this one has 3K dependents on npm. I feel reasonable security guarantees trump performance for any software, especially software with a large userbase.

joshbruce · 2018-04-30T17:08:40Z

Leaning toward @davisjam on this one. Back to the priorities:

Security
Spec-compliance
Speed
Low-level
Lightweight

If I had to add extensibility it would be 6.

But personally I think saying "use module at your own risk, see threat model" is just another way of saying "don't use this module".

Agreed.

(Maybe creating a threat model would be beneficial anyway. Give white hats a block of wood to widdle on - new challenges + new badges.)

It seems there are two major topics of concern keeping us from merging this: speed and extensibility, neither of which supersedes the priority of security. As long as it's not a "serious" performance hit I say security wins and it seems like @davisjam is doing his best to strike a balance; especially if the performance hit doesn't show up until someone puts in 500+ blank spaces.

Marked isn't the fastest kid on the block anymore (if it ever was) from the stats we've seen: #963. The primary reason @UziTech and I showed up was security, seems like we might be wavering a bit. (Marked appears to be twice as slow as Showdown in #963.)

Maybe we should put in a more proper and consistent benchmarking methodology and add it to Travis. Then actually capture the benchmark somewhere (right now we have the 1s, which doesn't seem accurate given the size of the tests). Give us the ability to say "we won't go slower than" - unless X, Y, Z. With the JSON we can even put a target ms per test.

Marked is going to change as we progress.

I even mentioned this to @chjj at the beginning of the transition just before getting the org...it may not resemble the code he wrote...it's claim to fame when we showed up was that it was dead but no one had the heart to put it down humanely. Now it's maybe 60% spec compliant with CommonMark and GFM. Uses GFM by default (not sure how many people actually use Pedantic - with or without Marked - most of us didn't even know how to get our hands on the original Perl...think a couple of spike contributors didn't know about Gruber and Daring Fireball). It's trying to come back to life.

What if Marked's new claim to fame is that it is the safest, most spec-compliant, and fastest. (I've definitely appreciated our security solutions since @davisjam showed up.)

joshbruce · 2018-04-30T17:27:02Z

@davisjam: We are working on user analysis slowly but surely, see #1123

UziTech · 2018-04-30T18:27:42Z

@davisjam reasonable security is what I am getting at. Trying to prevent all potential malicious DoS vectors in my mind is way past reasonable security. Dependents must understand that, regardless of using regexes, there will always be potential for malicious actors to make parsing slow if it is on the main thread, and if that is a security issue for them, the best course of action is to move the parsing off the main thread not slow down parsing for normal use cases.

@joshbruce I agree with that list of priorities and security should come first. I'm just saying there is such a thing as too much security and we need to draw that line somewhere. It would be really nice to get some sort of benchmarking so we can tell if these types of changes do in fact affect performance rather than just guessing.

P.S. I'm not against the changes in this PR or in 24d4a5e I just think we need to watch where that line is.

joshbruce · 2018-05-01T14:11:07Z

@UziTech: I think it will be easier to draw the lines once we have a way to measure impacts on performance until then it's mainly conjecture. Therefore, I think drawing the line toward security is the better way to go.

To @davisjam's point, I think we can also build the publicly safe API during 0.x then scale back a bit through options once we have a way to better measure performance. We can do that work in #963 for something after 0.4.

We're starting to rub up against the 2-week target for security things. What, if anything, can I do to help?

davisjam · 2018-05-01T14:31:18Z

I agree that some kind of performance benchmark would go a long way to addressing these disputes.
@joshbruce We could pursue a general safe API as an alternative to all of the changes in this PR (except the ones that help with spec compliance). In particular 24d4a5e is somewhat character-changing, since it adds 100 LoC to a 1KLoC project for just one regex and there are surely others that would be similarly problematic. Though a safe API wouldn't resolve the issue unless we:
a. Told the reporters that we thought the exploit was out of scope, or
b. Replaced the existing API with the safe one and exposed the unsafe one as a separate API (thus protecting anyone who upgraded), or
c. Told everyone they had to refactor their marked code to use the safe API, or
d. ?

styfle · 2018-09-11T13:25:23Z

The benchmarks were updated in #1019
Care to run them before and after this PR?

UziTech · 2018-12-05T19:31:01Z

@davisjam is this PR still being worked on?

UziTech · 2019-12-11T18:19:17Z

I'm going to close this PR. The changes will need to be moved to the /src/ folder if you are still working on this.

UziTech mentioned this pull request Apr 17, 2018

show failing test when original tests takes > 1s #1228

Merged

6 tasks

This comment has been minimized.

Sign in to view

UziTech reviewed Apr 18, 2018

View reviewed changes

davisjam and others added 4 commits April 25, 2018 21:03

security: finish fixing unsafe heading regex

0e07a9f

Apply similar patch to a similar heading regex. Follow-on to f052a2c. Test: Add a test case to demonstrate the slower blow-up.

security: corrected patch of unsafe heading regex

990b452

address review comment: equivalence of [^\n] and .

5736014

make header up to spec

943d995

davisjam force-pushed the SonatypeReport branch from 0ceca90 to 15ac630 Compare April 26, 2018 01:04

security: replace unsafe /X+$/ idiom with rtrim

0cfe39e

Problem: replace(/X+$/, '') is vulnerable to REDOS Solution: Replace all instances I could find with a custom rtrim

davisjam force-pushed the SonatypeReport branch from 15ac630 to 0cfe39e Compare April 26, 2018 01:12

styfle reviewed Apr 26, 2018

View reviewed changes

styfle previously approved these changes Apr 26, 2018

View reviewed changes

davisjam force-pushed the SonatypeReport branch from b55871f to dd26af8 Compare April 27, 2018 01:49

WIP: safen the text regex via linear-time scans

24d4a5e

Sketch implementing text regex as a linear-time RegExp imitator. - A few nits here and there - I haven't tested all of the offsetOfX routines, so 'npm run test' hangs on some bug

davisjam mentioned this pull request May 1, 2018

Gfm tables #1245

Merged

6 tasks

This was referenced May 9, 2018

[DevOps]: Node Security #1158

Closed

security: use rtrim, not unsafe /X+$/ #1260

Merged

joshbruce mentioned this pull request May 14, 2018

New release tag? #1271

Closed

styfle added the category: headings label Dec 6, 2018

UziTech closed this Dec 11, 2019

CMaheshBL mentioned this pull request May 6, 2022

Cx3bab5572-419d @ Npm-marked-0.3.9 CMaheshBL/NodeGoat#168

Open

cxronen mentioned this pull request May 20, 2022

Cx3bab5572-419d @ Npm-marked-0.3.9 cxronen/AST_BookStore#227

Open

cxronen mentioned this pull request Nov 28, 2022

Cx3bab5572-419d @ Npm-marked-0.3.9 cxronen/BookStore#178

Open

cxronen mentioned this pull request Jan 26, 2023

Cx3bab5572-419d @ Npm-marked-0.3.9 cxronen/BookStore#410

Open

RobertMickleCx mentioned this pull request Mar 7, 2023

Cx3bab5572-419d @ Npm-marked-0.3.9 RobertMickleCx/NodeGoat#150

Open

cxronen mentioned this pull request Mar 2, 2023

Cx3bab5572-419d @ Npm-marked-0.3.9 cxronen/AST_BookStore#492

Open

security: finish fixing unsafe heading regex #1226

security: finish fixing unsafe heading regex #1226

Conversation

davisjam commented Apr 16, 2018 • edited Loading

UziTech commented Apr 17, 2018

styfle commented Apr 17, 2018

davisjam commented Apr 17, 2018

UziTech commented Apr 17, 2018

davisjam commented Apr 17, 2018 • edited Loading

UziTech commented Apr 17, 2018

davisjam commented Apr 17, 2018

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

davisjam commented Apr 18, 2018

UziTech Apr 18, 2018

Choose a reason for hiding this comment

davisjam Apr 18, 2018

Choose a reason for hiding this comment

davisjam Apr 18, 2018

Choose a reason for hiding this comment

davisjam commented Apr 18, 2018 • edited Loading

styfle commented Apr 18, 2018

joshbruce commented Apr 18, 2018 • edited Loading

UziTech commented Apr 19, 2018

davisjam commented Apr 25, 2018

davisjam commented Apr 26, 2018 • edited Loading

davisjam commented Apr 26, 2018

styfle Apr 26, 2018

Choose a reason for hiding this comment

UziTech Apr 26, 2018

Choose a reason for hiding this comment

davisjam Apr 26, 2018

Choose a reason for hiding this comment

davisjam commented Apr 30, 2018

UziTech commented Apr 30, 2018

UziTech commented Apr 30, 2018 • edited Loading

davisjam commented Apr 30, 2018

UziTech commented Apr 30, 2018

davisjam commented Apr 30, 2018

joshbruce commented Apr 30, 2018 • edited Loading

joshbruce commented Apr 30, 2018

UziTech commented Apr 30, 2018

joshbruce commented May 1, 2018

davisjam commented May 1, 2018

styfle commented Sep 11, 2018

UziTech commented Dec 5, 2018

UziTech commented Dec 11, 2019

davisjam commented Apr 16, 2018 •

edited

Loading

davisjam commented Apr 17, 2018 •

edited

Loading

davisjam commented Apr 18, 2018 •

edited

Loading

joshbruce commented Apr 18, 2018 •

edited

Loading

davisjam commented Apr 26, 2018 •

edited

Loading

UziTech commented Apr 30, 2018 •

edited

Loading

joshbruce commented Apr 30, 2018 •

edited

Loading