Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ReDoS with autolink #73

Merged
merged 1 commit into from
Sep 26, 2019
Merged

Fix ReDoS with autolink #73

merged 1 commit into from
Sep 26, 2019

Conversation

Alcaro
Copy link
Contributor

@Alcaro Alcaro commented Sep 25, 2019

Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms.

Tested with

<script src="simple-markdown.min.js"></script>
<script>
var str = "<".repeat(2000)+":/".repeat(1500);
var t1 = performance.now();
SimpleMarkdown.defaultInlineParse(str);
var t2 = performance.now();
document.write(str.length+" bytes, "+(t2-t1)+"ms");
</script>

This PR will change parsing of weird things like <foo:bar:/baz>. If you'd prefer leaving behavior 100% unchanged, the regex can instead be changed to /^<(?=[^ >]+:\/)([^ >]+)>/, with no meaningful performance difference.

What is the project policy for what counts as a DoS? Quadratic complexity is fine, anything higher is a bug? More than 1ms per byte on a 10KB document is a bug? Anything superlinear is a bug? If the latter, I can name a few other inputs with quadratic runtime.

Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms.
@ariabuckles ariabuckles merged commit 015a719 into ariabuckles:master Sep 26, 2019
@ariabuckles
Copy link
Owner

This looks great; thanks for contributing it! I think the parsing change is fine.

Yeah, I think anything superlinear is probably a problem. If you have examples I'd be happy to look into and fix them, or happy to accept your fixes! (I haven't done a lot of measuring times per document size, but I expect if we had everything linear most of those would be fine too.)

@ariabuckles
Copy link
Owner

Actually, I might have missed something: what's still quadratic about the fixed version here? The whole parse, or the individual autolink match?

I think the whole parse in general will be quadratic as it may re-run any regex for each character in the worst case (like your example), but the matchers themselves should ideally all be linear.

@Alcaro
Copy link
Contributor Author

Alcaro commented Sep 26, 2019

I'm referring to it being quadratic (formerly cubic) in the entire parse. All other superlinears I'm aware of mean quadratic documents as well, per my definition. I apologize for the confusion, I should've been clearer.

I wrote this analysis of the quadratic documents before you submitted your second post, so I'll post it even though it's not necessarily interesting.

<<<<<<<<<<<<<<<<<<<< - autolink tries to start at every one, fails because no :/ (and no >). Could be fixed by rejecting < in autolinks, but unless the other quadratic documents are fixed, fixing this doesn't really solve anything.
******************** - the outermost pair of asterisks become italics, which is parsed recursively. The inner parser also sees a long string of asterisks, and turns the outermost into italics; repeat until there's only one or two asterisks left. Can be fixed by rejecting recursive italics, but that'd be more effort than it's worth. And the strong rule would most likely require something similar.
____________________ - same as the above. Still italics, even.
\``\``\``\``\``\``\``\``\``\`` - the first backtick is a perfectly normal escape. Second one tries to find a sequence of exactly 1 backtick so it can close, but all subsequent ones are doubles, so it scans to the end of the string and fails, emitting a single plaintext backtick. Next up is a backslash, which eats the third backtick, and the fourth one scans to the end again. I cannot think of any way to fix this without altering parsing of realistic documents.

There are most likely other ways to cause quadratic (or worse) document parsing; I didn't check all parsing rules. Some of them exceed my brain capacity.

@ariabuckles
Copy link
Owner

Thanks & sorry for the late reply! That was a thoughtful and helpful explanation!

I think you're right in that right now the quadratic document cases would be pretty challenging to address without a major re-thinking of how this works. I believe most parsing algorithms are quadratic in the worst case, so while we could probably do better for markdown specifically, that might be generally okay.

There are most likely other ways to cause quadratic (or worse) document parsing; I didn't check all parsing rules. Some of them exceed my brain capacity.

I've generally tried to check these visually as well, but as in this example, I don't always see them. If you run into any more, I'd love to hear about them so we can fix them :).

Thanks again for your comments and PR here and for making this project better for everyone!

ariabuckles added a commit that referenced this pull request Oct 24, 2019
andangrd added a commit to andangrd/simple-markdown that referenced this pull request Jan 2, 2020
* Fix ariabuckles#68 escaping pipes in tables

* v0.5.0

* Fix license copyright

* v0.5.1: Fix .git folder in published archive

* Tests: Add exponential backtracking test for inline code

Addresses ariabuckles#71

Inline code has an exponential backtracking regex.
This commit makes tests for those so that we can verify we fix them.

Test plan:

1. Run `make test`
    * Verify the three exponential backtracking avoidance tests fail

* inlineCode: Fix ReDoS & improve escape semantics

Our inline code regex had overlapping parts in the case of spaces;
spaces could be parsed as part of the `\s*`s or as part of the
`[\S\s]*`, which leads to catastrophic backtracking in the case of
a string with many spaces.

The `\s*` parts were attempting to allow for escaping of backticks
within the start/end of inline code blocks, so this commit removes
the `\s*` parts of the regex, and then after parsing the code block,
checks for the case where a single space is being used to escape
a backtick, and removes that space.

Test plan:

1. Added a test for the semantics change.
2. Added a test for the exponential backtracking to match the test
   case in issue ariabuckles#71

* Heading: Add test for exponential backtracking in headings

* Heading: Fix exponential backtracking

Test plan:

1. `make test`
    * Verify all tests now pass

* Del: Add test for del/strikethrough exponential backtracking

* Del: Fix del/strikethrough exponential backtracking

* Fence: Add test for code fence exponential backtracking

* Fence: fix fence exponential backtracks

* v0.5.2: Fix exponential backtracking / ReDoS regexes

* inlineCode: fix overzealous replacement

$1 only refers to the parens in the first branch of the `` /^ ( *` *) $|^ ( *`)|(` *) $/g `` disjunction. If another branch matches, the replacement will be blank.

* Add a globalPrevCapture that persists into nested parses

This adds an additional `globalPrevCapture` that is "global" state for the parses and persist into nested invocations of `nestedParse`. This allows rules to determine if they are truly at the beginning of a line. The existing `prevCapture` resets at each nesting, so it only allows you to determine if you're at the beginning of the nested parse, but not necessarily the line.

* Move `prevCapture` to the `state` and make it global for a given parse.

* Remove stray comma

* Bump mixin-deep from 1.3.1 to 1.3.2

Bumps [mixin-deep](https://github.com/jonschlinkert/mixin-deep) from 1.3.1 to 1.3.2.
- [Release notes](https://github.com/jonschlinkert/mixin-deep/releases)
- [Commits](jonschlinkert/mixin-deep@1.3.1...1.3.2)

Signed-off-by: dependabot[bot] <support@github.com>

* Make prevCapture comparisons `== null` for consistency

Minor change; most of the other comparisons work this way, and this
makes prevCapture harder to break

* v0.6.0

Fix ariabuckles#72: backticks in code blocks bug
Add ariabuckles#66: state.prevCapture that persists between parse calls
ariabuckles#70: Bump devDependency versions for security

* Minify for v0.6.0

* Fix ReDoS with autolink

Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms.

* Dev Dependencies: Run `npm audit fix` to fix vulnerabilities

* Create index.d.ts typings file

* Export everything

ES6 import usage is:

```javascript
import * as SimpleMarkdown from 'simple-markdown';
```

so rather than export a default object, we should be exporting everything.

* v0.6.1: ariabuckles#73 Fix ReDoS with autolink

* Flow: Clean up some flow types to match typescript

Just a small fixup of some flow types so that when adding the typescript
types they're more of a straight port.

Done before the typescript changes so this change is easier to verify on
its own.

Test plan: `make check`

* Error handling: Clean up error handling

This cleanup happens in two parts:

1. We remove some logic to disable certain error handling, which, as
   far as I can tell, was never actually used :/. Doing this makes
   both typescript and flow happier (and it's harder to work around
   in typescript than it was in flow).

    * In order to avoid introducing a potential perf regression for
      anyone who could have been using `state.disableErrorGuards`,
      we also remove the more expensive string comparison check, and
      instead only check for missing `^` if the result is a raw
      regex capture result (i.e., if it has a numeric index, we warn
      if that index is non-zero).

2. We add an error check if the Array joiner rule isn't defined for
   a non-html/react output type.

    * This will make it easier to convince typescript that the function
      we call isn't undefined, and is a better api/documentation as
      well.

Test plan:

As... noted in the comment, these parts aren't really tested in the
tests, but I did run the tests to verify that the normal path API
didn't seem to break.

* React Keys: Fix sending null as a key

See facebook/react#5519

* Typescript: Move index.d.ts to simple-markdown.d.ts

* Typescript: Configure typescript checking for build process

Install typescript and configure it to check simple-markdown.js &
simple-markdown.d.ts during `make check`

Test plan: `make check`

* Typescript: Combine ts and flow types into up-to-date v0.6 ts types

Updates type type definition file to be equivalent to the flow typings.

Test plan:

I didn't test this externally; will ask someone using typescript to check
whether this is working for their usecase.

The contents of the file are tested later with `make check` though.

* Typescript: Add types to simple-markdown.js source

Add typescript types to our source, so that we can use typescript in
addition to flow to check our source, and verify that our types are
correct.

Test plan: tested in later commit that enables tsc in `make check`

* Typescript: Add type checking of tests

Adds types to simple-markdown-test.js and checks that file with
typescript as well \o/

Test plan: `make test`

* v0.7.0: Typescript types

* Allow one level of balanced parens in link urls

* Setup rollup

* Build system: Integrate rollup with the rest of the system

Adds some integration for rollup/the rollup changes with some other
parts of the build system:

 * `make` targets
 * fixes typescript config
 * uses es6 module exports to switch to fully using rollup's umd
 * integrates rollup with npm prepublish so I can't forget to build

* Remove unnecessary extra LIST_R.exec call

As far as I can tell, this wasn't being used, and was creating an unused
variable.

Test plan: `make build test`

* Flow: Upgrade flow to 0.111.1 and fix flow errors

Test plan: `npm ci && make test`

* DevDependencies: version bumps

* Flow: Build files after flow changes

* Tests: Remove test dependency on underscore

Test plan: `make test`

* flow-typed: update flow types

Test plan: `make test`

* v0.7.1

* Typescript dependencies: Remove @types/node dependency

Now that we're bundling with rollup, we shouldn't need the @types/node
dependency, which can make installation more challenging.

Fixes ariabuckles#80

* DevDependencies: Update nyc/coverage for `npm audit`

* v0.7.2

Co-authored-by: Aria Buckles <aria@toole1.com>
Co-authored-by: Alcaro <floating@muncher.se>
Co-authored-by: Danny Weinberg <FuegoFro@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Danny Cochran <daniel@cochrans.org>
Co-authored-by: Sergey Slipchenko <faergeek@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants