Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a globalPrevCapture that persists into nested parses #66

Conversation

FuegoFro
Copy link
Contributor

This adds an additional globalPrevCapture that is "global" state for the parses and persist into nested invocations of nestedParse. This allows rules to determine if they are truly at the beginning of a line. The existing prevCapture resets at each nesting, so it only allows you to determine if you're at the beginning of the nested parse, but not necessarily the line.

@FuegoFro
Copy link
Contributor Author

Note that this adds a new globalPrevCapture rather than changing the behavior of prevCapture to prevent changing the semantics of the existing APIs, though if you think it would be best to change prevCapture to be global I can change this to do that instead 🙂

@ariabuckles
Copy link
Owner

Hi @FuegoFro and thanks for sending this!

Work has been super busy so it'll take me a bit to get back to you more thoroughly, but this looks like a great addition.

This adds an additional `globalPrevCapture` that is "global" state for the parses and persist into nested invocations of `nestedParse`. This allows rules to determine if they are truly at the beginning of a line. The existing `prevCapture` resets at each nesting, so it only allows you to determine if you're at the beginning of the nested parse, but not necessarily the line.
@night night force-pushed the preserve_previous_content_across_nested_parses branch from debbdab to 756b172 Compare September 1, 2019 01:42
@ariabuckles
Copy link
Owner

This looks good to me, but I think we could maybe simplify the whole prevCapture / globalPrevCapture thing by moving prevCapture to be on state, and then defining it to be global (which state is by default).

Then we'd get:

  1. the more helpful semantics you propose
  2. backwards compatibility with existing code that relies on prevCapture not being global (honestly, not a use case I'm particularly inclined to support unless someone shows me something relying on it)
  3. the ability to deprecate prevCapture (since we won't need anyone to rely on it's position in the arguments list, which they would need to if they wanted to use globalPrevCapture here).
  4. We can maybe clean up the prevCapture semantics a bit; it's a bit odd that it's named prevCapture but it's only capture[0]

Would that satisfy your use case @FuegoFro and @night ?

I think the change would mostly be:

         // ensure they don't match arbitrary '- ' or '* ' in inline
         // text (see the list rule for more information).
         var prevCapture = "";
+        state.prevCapture = null;
         while (source) {
             // store the best match, it's rule, and quality:
             var ruleType = null;
             }
 
             prevCapture = capture[0];
+            state.prevCapture = capture;
             source = source.substring(prevCapture.length);
         }
         return result;
     },
     list: {
         order: currOrder++,
-        match: function(source, state, prevCapture) {
+        match: function(source, state) {
             // We only want to break into a list if we are at the start of a
             // line. This is to avoid parsing "hi * there" with "* there"
             // becoming a part of a list.
            // You might wonder, "but that's inline, so of course it wouldn't
            // start a list?". You would be correct! Except that some of our
             // lists can be inline, because they might be inside another list,
             // in which case we can parse with inline scope, but need to allow
             // nested lists inside this inline scope.
-            var isStartOfLineCapture = LIST_LOOKBEHIND_R.exec(prevCapture);
+            var prevCaptureStr = state.prevCapture == null ? "" : state.prevCapture[0];
+            var isStartOfLineCapture = LIST_LOOKBEHIND_R.exec(prevCaptureStr);
             var isListBlock = state._list || !state.inline;
 
             if (isStartOfLineCapture && isListBlock) {

If that works for you, would you like to make that (happy to have you get credit for this!), or would it be easier for you if I did?

Thanks again for submitting this and very sorry for the delays!

@FuegoFro
Copy link
Contributor Author

That makes sense, I'm happy to move it onto the state! I think we can't quite use the exact diffs you have here since that'll still reset state.prevCapture to null at the beginning of each new level of nesting, but I believe that effectively changing each of the places I have globalPrevCapture to state.prevCapture would do the trick. I've updated the PR to do that 🙂

I wasn't sure if I should fully remove the prevCapture argument to the MatchFunction or quality functions now, or if there's a process for deprecation and eventual removal.

@@ -357,13 +352,14 @@ var parserFor = function(rules /*: ParserRules */, defaultState /*: ?State */) {

do {
var currOrder = currRule.order;
var currCapture = currRule.match(source, state, prevCapture);
var prevCaptureStr = state.prevCapture === null ? "" : state.prevCapture[0];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to maintain the legacy behavior here we could add an || i === 0 to the condition, which would reset it at the beginning of each nesting like it used to.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think I'll do that so we don't break any legacy code, but will encourage state.prevCapture in the future.

Copy link
Owner

@ariabuckles ariabuckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; this looks great!

I think we can't quite use the exact diffs you have here since that'll still reset state.prevCapture to null at the beginning of each new level of nesting

Oops! Thanks for catching and implementing that correctly!

I wasn't sure if I should fully remove the prevCapture argument to the MatchFunction or quality functions now, or if there's a process for deprecation and eventual removal.

I have some things deprecated, but given how removing any deprecation would technically be a breaking change, I have tried to preserve legacy semantics as much as possible. I think that's possible here too, so I'll eventually update the docs but keep supporting the legacy prevCaptureStr in the code.

Sorry about the delays again, I'll publish a new version with this tonight.

@@ -357,13 +352,14 @@ var parserFor = function(rules /*: ParserRules */, defaultState /*: ?State */) {

do {
var currOrder = currRule.order;
var currCapture = currRule.match(source, state, prevCapture);
var prevCaptureStr = state.prevCapture === null ? "" : state.prevCapture[0];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think I'll do that so we don't break any legacy code, but will encourage state.prevCapture in the future.

@ariabuckles ariabuckles merged commit 966a17a into ariabuckles:master Sep 19, 2019
ariabuckles added a commit that referenced this pull request Sep 19, 2019
Fix #72: backticks in code blocks bug
Add #66: state.prevCapture that persists between parse calls
#70: Bump devDependency versions for security
@ariabuckles
Copy link
Owner

Published v0.6.0 with this change <3!

@FuegoFro FuegoFro deleted the preserve_previous_content_across_nested_parses branch September 19, 2019 04:55
@FuegoFro
Copy link
Contributor Author

Yay, thank you! 😀

@ariabuckles
Copy link
Owner

Thank you!

andangrd added a commit to andangrd/simple-markdown that referenced this pull request Jan 2, 2020
* Fix ariabuckles#68 escaping pipes in tables

* v0.5.0

* Fix license copyright

* v0.5.1: Fix .git folder in published archive

* Tests: Add exponential backtracking test for inline code

Addresses ariabuckles#71

Inline code has an exponential backtracking regex.
This commit makes tests for those so that we can verify we fix them.

Test plan:

1. Run `make test`
    * Verify the three exponential backtracking avoidance tests fail

* inlineCode: Fix ReDoS & improve escape semantics

Our inline code regex had overlapping parts in the case of spaces;
spaces could be parsed as part of the `\s*`s or as part of the
`[\S\s]*`, which leads to catastrophic backtracking in the case of
a string with many spaces.

The `\s*` parts were attempting to allow for escaping of backticks
within the start/end of inline code blocks, so this commit removes
the `\s*` parts of the regex, and then after parsing the code block,
checks for the case where a single space is being used to escape
a backtick, and removes that space.

Test plan:

1. Added a test for the semantics change.
2. Added a test for the exponential backtracking to match the test
   case in issue ariabuckles#71

* Heading: Add test for exponential backtracking in headings

* Heading: Fix exponential backtracking

Test plan:

1. `make test`
    * Verify all tests now pass

* Del: Add test for del/strikethrough exponential backtracking

* Del: Fix del/strikethrough exponential backtracking

* Fence: Add test for code fence exponential backtracking

* Fence: fix fence exponential backtracks

* v0.5.2: Fix exponential backtracking / ReDoS regexes

* inlineCode: fix overzealous replacement

$1 only refers to the parens in the first branch of the `` /^ ( *` *) $|^ ( *`)|(` *) $/g `` disjunction. If another branch matches, the replacement will be blank.

* Add a globalPrevCapture that persists into nested parses

This adds an additional `globalPrevCapture` that is "global" state for the parses and persist into nested invocations of `nestedParse`. This allows rules to determine if they are truly at the beginning of a line. The existing `prevCapture` resets at each nesting, so it only allows you to determine if you're at the beginning of the nested parse, but not necessarily the line.

* Move `prevCapture` to the `state` and make it global for a given parse.

* Remove stray comma

* Bump mixin-deep from 1.3.1 to 1.3.2

Bumps [mixin-deep](https://github.com/jonschlinkert/mixin-deep) from 1.3.1 to 1.3.2.
- [Release notes](https://github.com/jonschlinkert/mixin-deep/releases)
- [Commits](jonschlinkert/mixin-deep@1.3.1...1.3.2)

Signed-off-by: dependabot[bot] <support@github.com>

* Make prevCapture comparisons `== null` for consistency

Minor change; most of the other comparisons work this way, and this
makes prevCapture harder to break

* v0.6.0

Fix ariabuckles#72: backticks in code blocks bug
Add ariabuckles#66: state.prevCapture that persists between parse calls
ariabuckles#70: Bump devDependency versions for security

* Minify for v0.6.0

* Fix ReDoS with autolink

Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms.

* Dev Dependencies: Run `npm audit fix` to fix vulnerabilities

* Create index.d.ts typings file

* Export everything

ES6 import usage is:

```javascript
import * as SimpleMarkdown from 'simple-markdown';
```

so rather than export a default object, we should be exporting everything.

* v0.6.1: ariabuckles#73 Fix ReDoS with autolink

* Flow: Clean up some flow types to match typescript

Just a small fixup of some flow types so that when adding the typescript
types they're more of a straight port.

Done before the typescript changes so this change is easier to verify on
its own.

Test plan: `make check`

* Error handling: Clean up error handling

This cleanup happens in two parts:

1. We remove some logic to disable certain error handling, which, as
   far as I can tell, was never actually used :/. Doing this makes
   both typescript and flow happier (and it's harder to work around
   in typescript than it was in flow).

    * In order to avoid introducing a potential perf regression for
      anyone who could have been using `state.disableErrorGuards`,
      we also remove the more expensive string comparison check, and
      instead only check for missing `^` if the result is a raw
      regex capture result (i.e., if it has a numeric index, we warn
      if that index is non-zero).

2. We add an error check if the Array joiner rule isn't defined for
   a non-html/react output type.

    * This will make it easier to convince typescript that the function
      we call isn't undefined, and is a better api/documentation as
      well.

Test plan:

As... noted in the comment, these parts aren't really tested in the
tests, but I did run the tests to verify that the normal path API
didn't seem to break.

* React Keys: Fix sending null as a key

See facebook/react#5519

* Typescript: Move index.d.ts to simple-markdown.d.ts

* Typescript: Configure typescript checking for build process

Install typescript and configure it to check simple-markdown.js &
simple-markdown.d.ts during `make check`

Test plan: `make check`

* Typescript: Combine ts and flow types into up-to-date v0.6 ts types

Updates type type definition file to be equivalent to the flow typings.

Test plan:

I didn't test this externally; will ask someone using typescript to check
whether this is working for their usecase.

The contents of the file are tested later with `make check` though.

* Typescript: Add types to simple-markdown.js source

Add typescript types to our source, so that we can use typescript in
addition to flow to check our source, and verify that our types are
correct.

Test plan: tested in later commit that enables tsc in `make check`

* Typescript: Add type checking of tests

Adds types to simple-markdown-test.js and checks that file with
typescript as well \o/

Test plan: `make test`

* v0.7.0: Typescript types

* Allow one level of balanced parens in link urls

* Setup rollup

* Build system: Integrate rollup with the rest of the system

Adds some integration for rollup/the rollup changes with some other
parts of the build system:

 * `make` targets
 * fixes typescript config
 * uses es6 module exports to switch to fully using rollup's umd
 * integrates rollup with npm prepublish so I can't forget to build

* Remove unnecessary extra LIST_R.exec call

As far as I can tell, this wasn't being used, and was creating an unused
variable.

Test plan: `make build test`

* Flow: Upgrade flow to 0.111.1 and fix flow errors

Test plan: `npm ci && make test`

* DevDependencies: version bumps

* Flow: Build files after flow changes

* Tests: Remove test dependency on underscore

Test plan: `make test`

* flow-typed: update flow types

Test plan: `make test`

* v0.7.1

* Typescript dependencies: Remove @types/node dependency

Now that we're bundling with rollup, we shouldn't need the @types/node
dependency, which can make installation more challenging.

Fixes ariabuckles#80

* DevDependencies: Update nyc/coverage for `npm audit`

* v0.7.2

Co-authored-by: Aria Buckles <aria@toole1.com>
Co-authored-by: Alcaro <floating@muncher.se>
Co-authored-by: Danny Weinberg <FuegoFro@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Danny Cochran <daniel@cochrans.org>
Co-authored-by: Sergey Slipchenko <faergeek@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants