Add a globalPrevCapture that persists into nested parses #66

FuegoFro · 2019-07-10T19:03:52Z

This adds an additional globalPrevCapture that is "global" state for the parses and persist into nested invocations of nestedParse. This allows rules to determine if they are truly at the beginning of a line. The existing prevCapture resets at each nesting, so it only allows you to determine if you're at the beginning of the nested parse, but not necessarily the line.

FuegoFro · 2019-07-10T19:05:12Z

Note that this adds a new globalPrevCapture rather than changing the behavior of prevCapture to prevent changing the semantics of the existing APIs, though if you think it would be best to change prevCapture to be global I can change this to do that instead 🙂

ariabuckles · 2019-07-25T18:01:34Z

Hi @FuegoFro and thanks for sending this!

Work has been super busy so it'll take me a bit to get back to you more thoroughly, but this looks like a great addition.

This adds an additional `globalPrevCapture` that is "global" state for the parses and persist into nested invocations of `nestedParse`. This allows rules to determine if they are truly at the beginning of a line. The existing `prevCapture` resets at each nesting, so it only allows you to determine if you're at the beginning of the nested parse, but not necessarily the line.

ariabuckles · 2019-09-08T02:48:45Z

This looks good to me, but I think we could maybe simplify the whole prevCapture / globalPrevCapture thing by moving prevCapture to be on state, and then defining it to be global (which state is by default).

Then we'd get:

the more helpful semantics you propose
backwards compatibility with existing code that relies on prevCapture not being global (honestly, not a use case I'm particularly inclined to support unless someone shows me something relying on it)
the ability to deprecate prevCapture (since we won't need anyone to rely on it's position in the arguments list, which they would need to if they wanted to use globalPrevCapture here).
We can maybe clean up the prevCapture semantics a bit; it's a bit odd that it's named prevCapture but it's only capture[0]

Would that satisfy your use case @FuegoFro and @night ?

I think the change would mostly be:

         // ensure they don't match arbitrary '- ' or '* ' in inline
         // text (see the list rule for more information).
         var prevCapture = "";
+        state.prevCapture = null;
         while (source) {
             // store the best match, it's rule, and quality:
             var ruleType = null;

             }
 
             prevCapture = capture[0];
+            state.prevCapture = capture;
             source = source.substring(prevCapture.length);
         }
         return result;

     },
     list: {
         order: currOrder++,
-        match: function(source, state, prevCapture) {
+        match: function(source, state) {
             // We only want to break into a list if we are at the start of a
             // line. This is to avoid parsing "hi * there" with "* there"
             // becoming a part of a list.
            // You might wonder, "but that's inline, so of course it wouldn't
            // start a list?". You would be correct! Except that some of our
             // lists can be inline, because they might be inside another list,
             // in which case we can parse with inline scope, but need to allow
             // nested lists inside this inline scope.
-            var isStartOfLineCapture = LIST_LOOKBEHIND_R.exec(prevCapture);
+            var prevCaptureStr = state.prevCapture == null ? "" : state.prevCapture[0];
+            var isStartOfLineCapture = LIST_LOOKBEHIND_R.exec(prevCaptureStr);
             var isListBlock = state._list || !state.inline;
 
             if (isStartOfLineCapture && isListBlock) {

If that works for you, would you like to make that (happy to have you get credit for this!), or would it be easier for you if I did?

Thanks again for submitting this and very sorry for the delays!

FuegoFro · 2019-09-10T22:23:26Z

That makes sense, I'm happy to move it onto the state! I think we can't quite use the exact diffs you have here since that'll still reset state.prevCapture to null at the beginning of each new level of nesting, but I believe that effectively changing each of the places I have globalPrevCapture to state.prevCapture would do the trick. I've updated the PR to do that 🙂

I wasn't sure if I should fully remove the prevCapture argument to the MatchFunction or quality functions now, or if there's a process for deprecation and eventual removal.

FuegoFro · 2019-09-10T22:24:17Z

simple-markdown.js

@@ -357,13 +352,14 @@ var parserFor = function(rules /*: ParserRules */, defaultState /*: ?State */) {

            do {
                var currOrder = currRule.order;
-                var currCapture = currRule.match(source, state, prevCapture);
+                var prevCaptureStr = state.prevCapture === null ? "" : state.prevCapture[0];


If we wanted to maintain the legacy behavior here we could add an || i === 0 to the condition, which would reset it at the beginning of each nesting like it used to.

Makes sense. I think I'll do that so we don't break any legacy code, but will encourage state.prevCapture in the future.

ariabuckles

Thanks; this looks great!

I think we can't quite use the exact diffs you have here since that'll still reset state.prevCapture to null at the beginning of each new level of nesting

Oops! Thanks for catching and implementing that correctly!

I wasn't sure if I should fully remove the prevCapture argument to the MatchFunction or quality functions now, or if there's a process for deprecation and eventual removal.

I have some things deprecated, but given how removing any deprecation would technically be a breaking change, I have tried to preserve legacy semantics as much as possible. I think that's possible here too, so I'll eventually update the docs but keep supporting the legacy prevCaptureStr in the code.

Sorry about the delays again, I'll publish a new version with this tonight.

ariabuckles · 2019-09-19T01:27:56Z

simple-markdown.js

@@ -357,13 +352,14 @@ var parserFor = function(rules /*: ParserRules */, defaultState /*: ?State */) {

            do {
                var currOrder = currRule.order;
-                var currCapture = currRule.match(source, state, prevCapture);
+                var prevCaptureStr = state.prevCapture === null ? "" : state.prevCapture[0];


Makes sense. I think I'll do that so we don't break any legacy code, but will encourage state.prevCapture in the future.

Fix #72: backticks in code blocks bug Add #66: state.prevCapture that persists between parse calls #70: Bump devDependency versions for security

ariabuckles · 2019-09-19T01:44:10Z

Published v0.6.0 with this change <3!

FuegoFro · 2019-09-19T04:56:08Z

Yay, thank you! 😀

ariabuckles · 2019-09-19T19:12:26Z

Thank you!

* Fix ariabuckles#68 escaping pipes in tables * v0.5.0 * Fix license copyright * v0.5.1: Fix .git folder in published archive * Tests: Add exponential backtracking test for inline code Addresses ariabuckles#71 Inline code has an exponential backtracking regex. This commit makes tests for those so that we can verify we fix them. Test plan: 1. Run `make test` * Verify the three exponential backtracking avoidance tests fail * inlineCode: Fix ReDoS & improve escape semantics Our inline code regex had overlapping parts in the case of spaces; spaces could be parsed as part of the `\s*`s or as part of the `[\S\s]*`, which leads to catastrophic backtracking in the case of a string with many spaces. The `\s*` parts were attempting to allow for escaping of backticks within the start/end of inline code blocks, so this commit removes the `\s*` parts of the regex, and then after parsing the code block, checks for the case where a single space is being used to escape a backtick, and removes that space. Test plan: 1. Added a test for the semantics change. 2. Added a test for the exponential backtracking to match the test case in issue ariabuckles#71 * Heading: Add test for exponential backtracking in headings * Heading: Fix exponential backtracking Test plan: 1. `make test` * Verify all tests now pass * Del: Add test for del/strikethrough exponential backtracking * Del: Fix del/strikethrough exponential backtracking * Fence: Add test for code fence exponential backtracking * Fence: fix fence exponential backtracks * v0.5.2: Fix exponential backtracking / ReDoS regexes * inlineCode: fix overzealous replacement $1 only refers to the parens in the first branch of the `` /^ ( *` *) $|^ ( *`)|(` *) $/g `` disjunction. If another branch matches, the replacement will be blank. * Add a globalPrevCapture that persists into nested parses This adds an additional `globalPrevCapture` that is "global" state for the parses and persist into nested invocations of `nestedParse`. This allows rules to determine if they are truly at the beginning of a line. The existing `prevCapture` resets at each nesting, so it only allows you to determine if you're at the beginning of the nested parse, but not necessarily the line. * Move `prevCapture` to the `state` and make it global for a given parse. * Remove stray comma * Bump mixin-deep from 1.3.1 to 1.3.2 Bumps [mixin-deep](https://github.com/jonschlinkert/mixin-deep) from 1.3.1 to 1.3.2. - [Release notes](https://github.com/jonschlinkert/mixin-deep/releases) - [Commits](jonschlinkert/mixin-deep@1.3.1...1.3.2) Signed-off-by: dependabot[bot] <support@github.com> * Make prevCapture comparisons `== null` for consistency Minor change; most of the other comparisons work this way, and this makes prevCapture harder to break * v0.6.0 Fix ariabuckles#72: backticks in code blocks bug Add ariabuckles#66: state.prevCapture that persists between parse calls ariabuckles#70: Bump devDependency versions for security * Minify for v0.6.0 * Fix ReDoS with autolink Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms. * Dev Dependencies: Run `npm audit fix` to fix vulnerabilities * Create index.d.ts typings file * Export everything ES6 import usage is: ```javascript import * as SimpleMarkdown from 'simple-markdown'; ``` so rather than export a default object, we should be exporting everything. * v0.6.1: ariabuckles#73 Fix ReDoS with autolink * Flow: Clean up some flow types to match typescript Just a small fixup of some flow types so that when adding the typescript types they're more of a straight port. Done before the typescript changes so this change is easier to verify on its own. Test plan: `make check` * Error handling: Clean up error handling This cleanup happens in two parts: 1. We remove some logic to disable certain error handling, which, as far as I can tell, was never actually used :/. Doing this makes both typescript and flow happier (and it's harder to work around in typescript than it was in flow). * In order to avoid introducing a potential perf regression for anyone who could have been using `state.disableErrorGuards`, we also remove the more expensive string comparison check, and instead only check for missing `^` if the result is a raw regex capture result (i.e., if it has a numeric index, we warn if that index is non-zero). 2. We add an error check if the Array joiner rule isn't defined for a non-html/react output type. * This will make it easier to convince typescript that the function we call isn't undefined, and is a better api/documentation as well. Test plan: As... noted in the comment, these parts aren't really tested in the tests, but I did run the tests to verify that the normal path API didn't seem to break. * React Keys: Fix sending null as a key See facebook/react#5519 * Typescript: Move index.d.ts to simple-markdown.d.ts * Typescript: Configure typescript checking for build process Install typescript and configure it to check simple-markdown.js & simple-markdown.d.ts during `make check` Test plan: `make check` * Typescript: Combine ts and flow types into up-to-date v0.6 ts types Updates type type definition file to be equivalent to the flow typings. Test plan: I didn't test this externally; will ask someone using typescript to check whether this is working for their usecase. The contents of the file are tested later with `make check` though. * Typescript: Add types to simple-markdown.js source Add typescript types to our source, so that we can use typescript in addition to flow to check our source, and verify that our types are correct. Test plan: tested in later commit that enables tsc in `make check` * Typescript: Add type checking of tests Adds types to simple-markdown-test.js and checks that file with typescript as well \o/ Test plan: `make test` * v0.7.0: Typescript types * Allow one level of balanced parens in link urls * Setup rollup * Build system: Integrate rollup with the rest of the system Adds some integration for rollup/the rollup changes with some other parts of the build system: * `make` targets * fixes typescript config * uses es6 module exports to switch to fully using rollup's umd * integrates rollup with npm prepublish so I can't forget to build * Remove unnecessary extra LIST_R.exec call As far as I can tell, this wasn't being used, and was creating an unused variable. Test plan: `make build test` * Flow: Upgrade flow to 0.111.1 and fix flow errors Test plan: `npm ci && make test` * DevDependencies: version bumps * Flow: Build files after flow changes * Tests: Remove test dependency on underscore Test plan: `make test` * flow-typed: update flow types Test plan: `make test` * v0.7.1 * Typescript dependencies: Remove @types/node dependency Now that we're bundling with rollup, we shouldn't need the @types/node dependency, which can make installation more challenging. Fixes ariabuckles#80 * DevDependencies: Update nyc/coverage for `npm audit` * v0.7.2 Co-authored-by: Aria Buckles <aria@toole1.com> Co-authored-by: Alcaro <floating@muncher.se> Co-authored-by: Danny Weinberg <FuegoFro@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Danny Cochran <daniel@cochrans.org> Co-authored-by: Sergey Slipchenko <faergeek@gmail.com>

night mentioned this pull request Sep 1, 2019

ReDoS attack with inline code blocks #71

Closed

night force-pushed the preserve_previous_content_across_nested_parses branch from debbdab to 756b172 Compare September 1, 2019 01:42

Move prevCapture to the state and make it global for a given parse.

0959255

Remove stray comma

f1e1877

FuegoFro commented Sep 10, 2019

View reviewed changes

ariabuckles approved these changes Sep 19, 2019

View reviewed changes

ariabuckles merged commit 966a17a into ariabuckles:master Sep 19, 2019

ariabuckles added a commit that referenced this pull request Sep 19, 2019

v0.6.0

41ce8bb

Fix #72: backticks in code blocks bug Add #66: state.prevCapture that persists between parse calls #70: Bump devDependency versions for security

ariabuckles mentioned this pull request Sep 19, 2019

inlineCode: fix overzealous replacement #72

Merged

FuegoFro deleted the preserve_previous_content_across_nested_parses branch September 19, 2019 04:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a globalPrevCapture that persists into nested parses #66

Add a globalPrevCapture that persists into nested parses #66

FuegoFro commented Jul 10, 2019

FuegoFro commented Jul 10, 2019

ariabuckles commented Jul 25, 2019

ariabuckles commented Sep 8, 2019

FuegoFro commented Sep 10, 2019

FuegoFro Sep 10, 2019

ariabuckles Sep 19, 2019

ariabuckles left a comment

ariabuckles Sep 19, 2019

ariabuckles commented Sep 19, 2019

FuegoFro commented Sep 19, 2019

ariabuckles commented Sep 19, 2019

Add a globalPrevCapture that persists into nested parses #66

Add a globalPrevCapture that persists into nested parses #66

Conversation

FuegoFro commented Jul 10, 2019

FuegoFro commented Jul 10, 2019

ariabuckles commented Jul 25, 2019

ariabuckles commented Sep 8, 2019

FuegoFro commented Sep 10, 2019

FuegoFro Sep 10, 2019

Choose a reason for hiding this comment

ariabuckles Sep 19, 2019

Choose a reason for hiding this comment

ariabuckles left a comment

Choose a reason for hiding this comment

ariabuckles Sep 19, 2019

Choose a reason for hiding this comment

ariabuckles commented Sep 19, 2019

FuegoFro commented Sep 19, 2019

ariabuckles commented Sep 19, 2019