-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing ~~~~~~… takes quadratic time #1463
Comments
isn't that still considered linear time? Quadratic would be if it was n^2 right? |
Linear time, O(n), doubles each time the input size doubles. Quadratic time, O(n^2), quadruples each time the input size doubles. This is quadratic. |
You are right, I was thinking of exponential (2^n) |
Now that #1464 has fixed the recursion error on $ time node -e 'require("./lib/marked")(">".repeat(25000))'
0.62user 0.01system 0:00.63elapsed 101%CPU (0avgtext+0avgdata 39592maxresident)k
0inputs+0outputs (0major+4521minor)pagefaults 0swaps
$ time node -e 'require("./lib/marked")(">".repeat(50000))'
2.18user 0.02system 0:02.17elapsed 101%CPU (0avgtext+0avgdata 46456maxresident)k
0inputs+0outputs (0major+6252minor)pagefaults 0swaps
$ time node -e 'require("./lib/marked")(">".repeat(100000))'
8.31user 0.03system 0:08.28elapsed 100%CPU (0avgtext+0avgdata 64860maxresident)k
0inputs+0outputs (0major+10944minor)pagefaults 0swaps |
With GFM disabled, I still see the slowdown, so it doesn't seem to be related to strikeout syntax. Actually... it looks like the slowdown also occurs with backticks? Must be tied to code fences in some way. |
Closing this as it is unlikely to occur in regular markdown and if parsing untrusted markdown ReDoS can be solved by using a worker. |
If you put yourself in the shoes of someone developing an app using a “⚡ built for speed” Markdown library to parse user Markdown, how should they know that they need to build external countermeasures against an undocumented performance bug like this? |
ReDoS is a well documented vulnerability. Using workers to prevent it is also well documented. I don't think a user needs to know about this vulnerability to know the issues with parsing untrusted markdown. |
Further more it is possible to cause a sql injection vulnerability by passing untrusted input to sql, but that isn't a vulnerability in sql that is a vulnerability in the implementation. Same with this vulnerability. If the user parses untrusted markdown with marked or any other markdown parser it is not a vulnerability in the parser but in the implementation. |
ReDoS is a well documented vulnerability that applies to certain poorly-written regexps. Why should a user expect that a “⚡ built for speed” Markdown parsing library is internally built with regexps that are vulnerable to ReDoS? There are other Markdown parsing libraries that do not have this bug. In principle, it should not be fundamentally unsafe to parse untrusted Markdown. If your position is that it’s unsafe to parse untrusted Markdown with this library, then then you should document that. (You currently only document that it’s unsafe to use the output HTML.) |
I know of at least 10 ways to DoS markdown-it and commonmark,.js neither of which use regexps the way marked does. DoS is not only a vulnerability for regexps but any parser. If you would like to update the docs PRs are always welcome 😁👍 |
In the past I’ve reported 7 quadratic-time bugs in markdown-it and 1 in commonmark.js; they were all addressed very promptly with code fixes. If you know of further issues you should certainly report them. |
They have all been reported that is how I know about them. It doesn't take quadratic time to cause a DoS. So I think we can agree that any parser can have vulnerabilities and it is not ok to run untrusted code in a way that can cause DoS. Certainly there were vulnerabilities before they fixed the ones you reported. And there are certainly vulnerabilities before the next one is found and fixed. So the solution to this problem should be to educate users on how to mitigate that. I think we have done a good job illustrating that workaround in our docs. If you don't agree you are always allowed to create a PR to update the docs. |
Describe the bug
Marked takes quadratic time to parse
~~~~~~
….To Reproduce
Steps to reproduce the behavior:
Observe that the time quadruples each time the input size doubles. This could be exploited for denial of service in an application running Marked on the server side.
Expected behavior
A Markdown parser (especially one that claims “built for speed” as its primary selling point) should run in linear time on all inputs. This isn’t easy but it should be possible; several parsers including the reference C implementation seem to do a good job at this.
The text was updated successfully, but these errors were encountered: