Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding use of regex #14

Closed
arhankundu99 opened this issue Sep 12, 2020 · 2 comments
Closed

Regarding use of regex #14

arhankundu99 opened this issue Sep 12, 2020 · 2 comments
Labels
question Further information is requested

Comments

@arhankundu99
Copy link

Instead of generating distorted profane words from the list of profane words, can't we use regex?

@snguyenthanh
Copy link
Owner

snguyenthanh commented Sep 14, 2020

Using regex, the runtime will increase exponentially with the length of the input text. The original profanity package actually uses regex to censor the profane words.

Here is a simple benchmark with a 1000-word text:

# Python 3.6.8
# CPU: Intel i7-9750H @ 2.60 GHz

import timeit

from profanity import profanity as pf
from better_profanity import profanity as bpf


# Let the 2 packages use the same list of profane words
pf.load_words(bpf.CENSOR_WORDSET)

def benchmark(func, text: str):
    return func(text)

if __name__ == "__main__":
    test_str = "<a 1000-word text here>"
    regex_runtime = timeit('benchmark(pf.censor, test_str)', globals=globals(), number=1)
    current_runtime = timeit('benchmark(bpf.censor, test_str)', globals=globals(), number=1)

    print(f"Regex: {regex_runtime}")
    print(f"Current: {current_runtime}")

which has the following output:

# in seconds
Regex: 26.9981138
Current: 0.0000009468

@snguyenthanh snguyenthanh added the question Further information is requested label Sep 14, 2020
@arhankundu99
Copy link
Author

Thank you so much for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants