-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: speed up parsing long lists #2302
fix: speed up parsing long lists #2302
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/markedjs/markedjs/5cTrJbPLiAZvhvjqo23kEq9L2pBJ |
@@ -1 +0,0 @@ | |||
*foo __bar *baz bim__ bam* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this test removed? Is it failing with this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm I see this was an accidental addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a test I added playing around trying to debug my program and accidentally committed it to the PR.
module.exports = { | ||
markdown: '- a\n'.repeat(10000), | ||
html: `<ul>${'<li>a</li>'.repeat(10000)}</ul>` | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm this speeds up parsing long lists.
This test takes about 2 seconds on my machine on master and about 100 ms with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hooray!
I want to go over this code once more before it gets merged. I have a nagging sense that there's still some debris left behind I should clean up but exactly where is eluding me at the moment. |
@UziTech Cleaned up the logic how I wanted. Passes the specs but is failing the SNYK security test here and I'm not sure why. Otherwise, this is now ready to merged. Edit: Ah, there was a merge conflict somewhere setting it off. All fixed now! |
…child/marked into FixO(n2)ScalingForLists
## [4.0.6](v4.0.5...v4.0.6) (2021-12-02) ### Bug Fixes * speed up parsing long lists ([#2302](#2302)) ([e0005d8](e0005d8))
🎉 This PR is included in version 4.0.6 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Marked version:
v3.0.0 +
Description
The List tokenizer was using a RegEx to capture the potential next list Item, and then split that captured text line-by-line to determine if it had proper indentation, etc. and whether each line should be part of the current list Item.
Problem is, the captured text was literally the entire document, so for every potential list item, we were capturing the entire document and then splitting the document into lines. For longer documents, this meant spending the majority of time just splitting the document into lines over and over.
Here's the offending Regex, and notice that
(?:\\n[^\\n]*)*
matches everything to the end of the document:marked/src/Tokenizer.js
Line 193 in d098d55
This PR changes that so we only capture the first line (with it's bullet point), and once we verify that it is a candidate for starting a new list Item, we just traverse the SRC one line at a time. No more mass line-splitting when we really only need to look one line at a time anyway.
Offending line-splitting line where 90% of processing time was spent:
marked/src/Tokenizer.js
Line 205 in d098d55
Needs a bit of cleanup, but it's passing tests.
Contributor
Committer
In most cases, this should be a different person than the contributor.