Fix ReDoS with autolink #73

Alcaro · 2019-09-25T21:25:45Z

Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms.

Tested with

<script src="simple-markdown.min.js"></script>
<script>
var str = "<".repeat(2000)+":/".repeat(1500);
var t1 = performance.now();
SimpleMarkdown.defaultInlineParse(str);
var t2 = performance.now();
document.write(str.length+" bytes, "+(t2-t1)+"ms");
</script>

This PR will change parsing of weird things like <foo:bar:/baz>. If you'd prefer leaving behavior 100% unchanged, the regex can instead be changed to /^<(?=[^ >]+:\/)([^ >]+)>/, with no meaningful performance difference.

What is the project policy for what counts as a DoS? Quadratic complexity is fine, anything higher is a bug? More than 1ms per byte on a 10KB document is a bug? Anything superlinear is a bug? If the latter, I can name a few other inputs with quadratic runtime.

Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms.

ariabuckles · 2019-09-26T18:25:21Z

This looks great; thanks for contributing it! I think the parsing change is fine.

Yeah, I think anything superlinear is probably a problem. If you have examples I'd be happy to look into and fix them, or happy to accept your fixes! (I haven't done a lot of measuring times per document size, but I expect if we had everything linear most of those would be fine too.)

ariabuckles · 2019-09-26T18:36:33Z

Actually, I might have missed something: what's still quadratic about the fixed version here? The whole parse, or the individual autolink match?

I think the whole parse in general will be quadratic as it may re-run any regex for each character in the worst case (like your example), but the matchers themselves should ideally all be linear.

Alcaro · 2019-09-26T19:05:07Z

I'm referring to it being quadratic (formerly cubic) in the entire parse. All other superlinears I'm aware of mean quadratic documents as well, per my definition. I apologize for the confusion, I should've been clearer.

I wrote this analysis of the quadratic documents before you submitted your second post, so I'll post it even though it's not necessarily interesting.

<<<<<<<<<<<<<<<<<<<< - autolink tries to start at every one, fails because no :/ (and no >). Could be fixed by rejecting < in autolinks, but unless the other quadratic documents are fixed, fixing this doesn't really solve anything.
******************** - the outermost pair of asterisks become italics, which is parsed recursively. The inner parser also sees a long string of asterisks, and turns the outermost into italics; repeat until there's only one or two asterisks left. Can be fixed by rejecting recursive italics, but that'd be more effort than it's worth. And the strong rule would most likely require something similar.
____________________ - same as the above. Still italics, even.
\``\``\``\``\``\``\``\``\``\`` - the first backtick is a perfectly normal escape. Second one tries to find a sequence of exactly 1 backtick so it can close, but all subsequent ones are doubles, so it scans to the end of the string and fails, emitting a single plaintext backtick. Next up is a backslash, which eats the third backtick, and the fourth one scans to the end again. I cannot think of any way to fix this without altering parsing of realistic documents.

There are most likely other ways to cause quadratic (or worse) document parsing; I didn't check all parsing rules. Some of them exceed my brain capacity.

ariabuckles · 2019-10-01T23:24:02Z

Thanks & sorry for the late reply! That was a thoughtful and helpful explanation!

I think you're right in that right now the quadratic document cases would be pretty challenging to address without a major re-thinking of how this works. I believe most parsing algorithms are quadratic in the worst case, so while we could probably do better for markdown specifically, that might be generally okay.

There are most likely other ways to cause quadratic (or worse) document parsing; I didn't check all parsing rules. Some of them exceed my brain capacity.

I've generally tried to check these visually as well, but as in this example, I don't always see them. If you run into any more, I'd love to hear about them so we can fix them :).

Thanks again for your comments and PR here and for making this project better for everyone!

* Fix ariabuckles#68 escaping pipes in tables * v0.5.0 * Fix license copyright * v0.5.1: Fix .git folder in published archive * Tests: Add exponential backtracking test for inline code Addresses ariabuckles#71 Inline code has an exponential backtracking regex. This commit makes tests for those so that we can verify we fix them. Test plan: 1. Run `make test` * Verify the three exponential backtracking avoidance tests fail * inlineCode: Fix ReDoS & improve escape semantics Our inline code regex had overlapping parts in the case of spaces; spaces could be parsed as part of the `\s*`s or as part of the `[\S\s]*`, which leads to catastrophic backtracking in the case of a string with many spaces. The `\s*` parts were attempting to allow for escaping of backticks within the start/end of inline code blocks, so this commit removes the `\s*` parts of the regex, and then after parsing the code block, checks for the case where a single space is being used to escape a backtick, and removes that space. Test plan: 1. Added a test for the semantics change. 2. Added a test for the exponential backtracking to match the test case in issue ariabuckles#71 * Heading: Add test for exponential backtracking in headings * Heading: Fix exponential backtracking Test plan: 1. `make test` * Verify all tests now pass * Del: Add test for del/strikethrough exponential backtracking * Del: Fix del/strikethrough exponential backtracking * Fence: Add test for code fence exponential backtracking * Fence: fix fence exponential backtracks * v0.5.2: Fix exponential backtracking / ReDoS regexes * inlineCode: fix overzealous replacement $1 only refers to the parens in the first branch of the `` /^ ( *` *) $|^ ( *`)|(` *) $/g `` disjunction. If another branch matches, the replacement will be blank. * Add a globalPrevCapture that persists into nested parses This adds an additional `globalPrevCapture` that is "global" state for the parses and persist into nested invocations of `nestedParse`. This allows rules to determine if they are truly at the beginning of a line. The existing `prevCapture` resets at each nesting, so it only allows you to determine if you're at the beginning of the nested parse, but not necessarily the line. * Move `prevCapture` to the `state` and make it global for a given parse. * Remove stray comma * Bump mixin-deep from 1.3.1 to 1.3.2 Bumps [mixin-deep](https://github.com/jonschlinkert/mixin-deep) from 1.3.1 to 1.3.2. - [Release notes](https://github.com/jonschlinkert/mixin-deep/releases) - [Commits](jonschlinkert/mixin-deep@1.3.1...1.3.2) Signed-off-by: dependabot[bot] <support@github.com> * Make prevCapture comparisons `== null` for consistency Minor change; most of the other comparisons work this way, and this makes prevCapture harder to break * v0.6.0 Fix ariabuckles#72: backticks in code blocks bug Add ariabuckles#66: state.prevCapture that persists between parse calls ariabuckles#70: Bump devDependency versions for security * Minify for v0.6.0 * Fix ReDoS with autolink Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms. * Dev Dependencies: Run `npm audit fix` to fix vulnerabilities * Create index.d.ts typings file * Export everything ES6 import usage is: ```javascript import * as SimpleMarkdown from 'simple-markdown'; ``` so rather than export a default object, we should be exporting everything. * v0.6.1: ariabuckles#73 Fix ReDoS with autolink * Flow: Clean up some flow types to match typescript Just a small fixup of some flow types so that when adding the typescript types they're more of a straight port. Done before the typescript changes so this change is easier to verify on its own. Test plan: `make check` * Error handling: Clean up error handling This cleanup happens in two parts: 1. We remove some logic to disable certain error handling, which, as far as I can tell, was never actually used :/. Doing this makes both typescript and flow happier (and it's harder to work around in typescript than it was in flow). * In order to avoid introducing a potential perf regression for anyone who could have been using `state.disableErrorGuards`, we also remove the more expensive string comparison check, and instead only check for missing `^` if the result is a raw regex capture result (i.e., if it has a numeric index, we warn if that index is non-zero). 2. We add an error check if the Array joiner rule isn't defined for a non-html/react output type. * This will make it easier to convince typescript that the function we call isn't undefined, and is a better api/documentation as well. Test plan: As... noted in the comment, these parts aren't really tested in the tests, but I did run the tests to verify that the normal path API didn't seem to break. * React Keys: Fix sending null as a key See facebook/react#5519 * Typescript: Move index.d.ts to simple-markdown.d.ts * Typescript: Configure typescript checking for build process Install typescript and configure it to check simple-markdown.js & simple-markdown.d.ts during `make check` Test plan: `make check` * Typescript: Combine ts and flow types into up-to-date v0.6 ts types Updates type type definition file to be equivalent to the flow typings. Test plan: I didn't test this externally; will ask someone using typescript to check whether this is working for their usecase. The contents of the file are tested later with `make check` though. * Typescript: Add types to simple-markdown.js source Add typescript types to our source, so that we can use typescript in addition to flow to check our source, and verify that our types are correct. Test plan: tested in later commit that enables tsc in `make check` * Typescript: Add type checking of tests Adds types to simple-markdown-test.js and checks that file with typescript as well \o/ Test plan: `make test` * v0.7.0: Typescript types * Allow one level of balanced parens in link urls * Setup rollup * Build system: Integrate rollup with the rest of the system Adds some integration for rollup/the rollup changes with some other parts of the build system: * `make` targets * fixes typescript config * uses es6 module exports to switch to fully using rollup's umd * integrates rollup with npm prepublish so I can't forget to build * Remove unnecessary extra LIST_R.exec call As far as I can tell, this wasn't being used, and was creating an unused variable. Test plan: `make build test` * Flow: Upgrade flow to 0.111.1 and fix flow errors Test plan: `npm ci && make test` * DevDependencies: version bumps * Flow: Build files after flow changes * Tests: Remove test dependency on underscore Test plan: `make test` * flow-typed: update flow types Test plan: `make test` * v0.7.1 * Typescript dependencies: Remove @types/node dependency Now that we're bundling with rollup, we shouldn't need the @types/node dependency, which can make installation more challenging. Fixes ariabuckles#80 * DevDependencies: Update nyc/coverage for `npm audit` * v0.7.2 Co-authored-by: Aria Buckles <aria@toole1.com> Co-authored-by: Alcaro <floating@muncher.se> Co-authored-by: Danny Weinberg <FuegoFro@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Danny Cochran <daniel@cochrans.org> Co-authored-by: Sergey Slipchenko <faergeek@gmail.com>

Fix ReDoS with autolink

3a857df

Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms.

ariabuckles merged commit 015a719 into ariabuckles:master Sep 26, 2019

ariabuckles added a commit that referenced this pull request Oct 24, 2019

v0.6.1: #73 Fix ReDoS with autolink

a630853

ariabuckles mentioned this pull request May 27, 2020

Many HTML Tags after asterisk/underscore crashes the browser quantizor/markdown-to-jsx#311

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ReDoS with autolink #73

Fix ReDoS with autolink #73

Alcaro commented Sep 25, 2019

ariabuckles commented Sep 26, 2019

ariabuckles commented Sep 26, 2019

Alcaro commented Sep 26, 2019

ariabuckles commented Oct 1, 2019

Fix ReDoS with autolink #73

Fix ReDoS with autolink #73

Conversation

Alcaro commented Sep 25, 2019

ariabuckles commented Sep 26, 2019

ariabuckles commented Sep 26, 2019

Alcaro commented Sep 26, 2019

ariabuckles commented Oct 1, 2019