Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protection against ReDoS #6163

Merged
merged 4 commits into from
Nov 7, 2019
Merged

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Sep 10, 2019

The regex module is compatible with the re module (VERSION0 flag).
It is also faster.

>>> import re
>>> import regex
>>> import timeit
>>> pattert = "(a+)+b"
>>> input = "a" * 25
>>> timeit.timeit(lambda: re.search(pattern, input), number=10)
32.332445038000515
>>> timeit.timeit(lambda: regex.search(pattern, input, flags=regex.VERSION0), number=10)
0.003861578001306043
>>> input = "a" * 10000
>>> regex.search(pattern, input, flags=regex.VERSION0, timeout=5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/stsewd/.pyenv/versions/readthedocs.org/lib/python3.6/site-packages/regex/regex.py", line 266, in search
    concurrent, partial, timeout)
TimeoutError: regex timed out

I put the timeout to 15, maybe we can drop it to 5?

@humitos
Copy link
Member

humitos commented Oct 2, 2019

This PR is related to #5996.

@humitos humitos added the Needed: design decision A core team decision is required label Oct 30, 2019
@humitos
Copy link
Member

humitos commented Nov 4, 2019

We decided to ship with regex (#4641 (comment)) so we should merge this PR before that PR gets merged, or merge this PR into the other first.

@humitos humitos added Accepted Accepted issue on our roadmap and removed Needed: design decision A core team decision is required labels Nov 4, 2019
@humitos
Copy link
Member

humitos commented Nov 4, 2019

I put the timeout to 15, maybe we can drop it to 5?

Even less, should be better. Parsing a regex shouldn't take more than 1s.

The regex module is compatible with the re module (VERSION0 flag).
It is also faster.

```python
>>> import re
>>> import regex
>>> import timeit
>>> pattert = "(a+)+b"
>>> input = "a" * 25
>>> timeit.timeit(lambda: re.search(pattern, input), number=10)
32.332445038000515
>>> timeit.timeit(lambda: regex.search(pattern, input, flags=regex.VERSION0), number=10)
0.003861578001306043
>>> input = "a" * 10000
>>> regex.search(pattern, input, flags=regex.VERSION0, timeout=5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/stsewd/.pyenv/versions/readthedocs.org/lib/python3.6/site-packages/regex/regex.py", line 266, in search
    concurrent, partial, timeout)
TimeoutError: regex timed out
```
@stsewd
Copy link
Member Author

stsewd commented Nov 6, 2019

Ok, I've decreased the timeout to 1 second. Another alternative is to use a finite state machine type of regex, but I wasn't able to find one lib for python...

@stsewd stsewd requested a review from a team November 6, 2019 19:19
@stsewd stsewd merged commit a8611aa into readthedocs:master Nov 7, 2019
@stsewd stsewd deleted the prevent-redos-attacks branch November 7, 2019 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants