Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[comments re] halt with CPU 100% on some html #263

Closed
puzrin opened this issue Jan 3, 2015 · 11 comments
Closed

[comments re] halt with CPU 100% on some html #263

puzrin opened this issue Jan 3, 2015 · 11 comments

Comments

@puzrin
Copy link
Contributor

puzrin commented Jan 3, 2015

Put this into dingus:

foo <!--- This is what the user will see during the configuration ->

Bug in tag detection regexp. Found by @fengmk2

Upd Verify that CDATA regexp does not have the same vulnerability

Upd2 The same problem in CDATA #267

@puzrin
Copy link
Contributor Author

puzrin commented Jan 3, 2015

Problem is at https://github.com/jgm/CommonMark/blob/master/js/lib/inlines.js#L36

var HTMLCOMMENT = "<!--([^-]+|[-][^-]+)*-->";

What are those complex conditions for? May be use one below?

var HTMLCOMMENT = "<!--[\s\S]*?-->";

It works.

@mildsunrise
Copy link
Contributor

From the spec:

An HTML comment consists of the string <!--, a string of characters not including the string --, and the string -->.

I think the ([^-]+|[-][^-]+)* aims to make sure comments don't contain --.

@puzrin
Copy link
Contributor Author

puzrin commented Jan 3, 2015

Ah, got it. Well:

  1. It does not match all conditions anyway http://www.w3.org/TR/html-markup/syntax.html#comments
  2. Browsers don't care additional conditions (no --, no >, no ending -)

I think it's more simple & safe to remove those from regexp. If anyone wish to get better safety, he will pas result via sanitizer. Or we could force escaping those. But those should not be considered as "not comment" block.

@puzrin
Copy link
Contributor Author

puzrin commented Jan 3, 2015

Upd: /<!--[\s\S]*?-->/ can fail on <!-->...<!--> & <!--->... <!-->, but we can do additional checks later, instead of unsafe regexp magic.

@mildsunrise
Copy link
Contributor

@puzrin I reported this just after posting my comment, see #264. They seem to have taken that from XML which is more permissive.

@puzrin
Copy link
Contributor Author

puzrin commented Jan 3, 2015

Memo. Verify that CDATA regexp does not have the same vulnerability, because it's built with the same principle.

Updated the first message.

@puzrin puzrin changed the title halt with CPU 100% on some html [comments re] halt with CPU 100% on some html Jan 3, 2015
@jgm
Copy link
Member

jgm commented Jan 3, 2015

These regexes work fine with cmark (and re2c, which creates a non-backtracking DFA from the regex). I guess the regex libraries in javascript allow pathological backtracking. Presumably there are ways to write equivalent regexes that won't be pathological? (In the case of the comment regex, it's probably worth matching HTML5 behavior, #264.)

@jgm
Copy link
Member

jgm commented Jan 3, 2015

How about this for a non-pathological regex that matches HTML5 behavior?

/<!--(?!<)(?!->)(?:-?[^-])+-->/

@puzrin
Copy link
Contributor Author

puzrin commented Jan 3, 2015

I've fixed comments in markdown-it with more simple regexp & postponed check:

markdown-it/markdown-it@792f386

Don't know if it's good for ref implementation.

@fengmk2
Copy link

fengmk2 commented Jan 4, 2015

@puzrin can you publish a new version markdown-it to npm?

@puzrin
Copy link
Contributor Author

puzrin commented Jan 4, 2015

@fengmk2 please, wait ~ 1 day max. We are about to finish 3.0.0 milestone.

Feel free to use markdown-it tracker for related questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants