Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(getStaticValue): string regex functions #82

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

RunDevelopment
Copy link

@RunDevelopment RunDevelopment commented Mar 19, 2023

This PR allows match, matchAll, replace, replaceAll, and split for strings. The limitation here is that we only allow those functions if we can prove that the search value (sting, regex, or similar) is safe. E.g. "foo".replace(/foo/, "bar") is safe, while "foo".replace(/(a|a)+b/, "backtracking") is not.

Checking whether a regex is safe:
What "safe" means for regexes in backtracking regex engine is quite hard to define, but I will define it as such: A regex r is safe if, for any input string, executiing r takes linear time with a reasonable small constant factor.

So all regexes with exponential (e.g. /(a|a)+b/, polynomial (e.g. a*a*b), and "move" (e.g. /a*b/) backtracking are unsafe. However, even regexes with no quantifier can be unsafe. Quantifiers are an easy way to create paths that the regex engine has to backtrack through, but we also create more paths with the alternation operator |. E.g. /a|a/ has 1 | and 2 paths, /(a|a)(a|a)/ has 2 | and 4 paths, /(a|a)(a|a)(a|a)/ has 3 | and 8 paths, and so on. So with k | operators, we can create a regex with 2^k paths.

So the check basically boils down to this:

  1. If there is any (unbound) quantifier *, +, {n,}, then we must conservatively assume that the regex is unsafe.
  2. If there are more than X possible paths, we also assume the regex to be unsafe.

Well, this is the check that I wanted to implement. Doing so would require to take on a dependency for a regex parser. So instead, I "approximate" this using a few string replacements that are going to break with the proposed v flag. So, should we take on a dependency, or should I refine my string replacements?

@codecov-commenter
Copy link

codecov-commenter commented Mar 19, 2023

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (ae9df9c) 96.95% compared to head (95f7531) 97.18%.
Report is 5 commits behind head on main.

Files Patch % Lines
src/safe-regex.mjs 97.53% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #82      +/-   ##
==========================================
+ Coverage   96.95%   97.18%   +0.23%     
==========================================
  Files          13       14       +1     
  Lines        2102     2419     +317     
  Branches      397      469      +72     
==========================================
+ Hits         2038     2351     +313     
- Misses         63       67       +4     
  Partials        1        1              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot added the Stale label May 18, 2023
@github-actions
Copy link

This PR has been automatically closed because we haven't received a response from the original author 🙈. This automation helps keep the issue tracker clean from PRs that aren't actionable. Please reach out if you have more information for us! 🙂

@github-actions github-actions bot closed this May 26, 2023
@RunDevelopment
Copy link
Author

RunDevelopment commented May 26, 2023

@MichaelDeBoey Could you please take a look at this? Simple regex execution is a very useful feature after all.

@MichaelDeBoey
Copy link
Member

@RunDevelopment Are you still interested in getting this one merged?

@RunDevelopment
Copy link
Author

Sure!

Resolving the merge conflict is obvious, but there is still an open question before we can merge this. Basically, the check for determining whether a regex is safe to execute is a dirty dirty hack and doesn't work for v-flag regexes. I need a regex parser to implement the check properly but that means taking a dependency.

@MichaelDeBoey
Copy link
Member

@RunDevelopment I don't mind adding a dependency if that makes this package more robust and we can delete some code here tbh

What package did you have in mind?
What are the benefits?
What are the drawbacks?

CC/ @ota-meshi

@RunDevelopment
Copy link
Author

What package did you have in mind?

regexpp. I use this regex parser everywhere, and it's part of the eslint-community org.

Since I have the OK to add a dependency, I'll do that and implement the full analysis soon.

@RunDevelopment RunDevelopment marked this pull request as ready for review October 15, 2023 17:12
@RunDevelopment
Copy link
Author

Done. The implementation of isSafeRegex turned out to be a bit more involved that I expected. Since my analysis essentially just counts the number of possible backtracking paths, it got more complex because of the v flag.

The v flag added strings in character classes/sets and each of those strings is one path the regex engine can backtrack on. E.g. [\w\q{foo|bar}] is exactly equivalent to foo|bar|\w, both have 3 backtracking paths. Set operations also aren't easy to implement, because we don't actually evaluate the set of strings and characters, so I had to use conservative approximations. As a result, the number of paths in v-flag character classes/sets might be overestimated but never underestimated. Fortunately, overestimating paths will result in false positives (we say that a regex is unsafe, even though it's actually safe), which is fine from a security point of view.

@github-actions github-actions bot added Stale and removed Stale labels Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants