Regarding use of regex #14

arhankundu99 · 2020-09-12T19:53:02Z

Instead of generating distorted profane words from the list of profane words, can't we use regex?

snguyenthanh · 2020-09-14T13:04:27Z

Using regex, the runtime will increase exponentially with the length of the input text. The original profanity package actually uses regex to censor the profane words.

Here is a simple benchmark with a 1000-word text:

# Python 3.6.8
# CPU: Intel i7-9750H @ 2.60 GHz

import timeit

from profanity import profanity as pf
from better_profanity import profanity as bpf


# Let the 2 packages use the same list of profane words
pf.load_words(bpf.CENSOR_WORDSET)

def benchmark(func, text: str):
    return func(text)

if __name__ == "__main__":
    test_str = "<a 1000-word text here>"
    regex_runtime = timeit('benchmark(pf.censor, test_str)', globals=globals(), number=1)
    current_runtime = timeit('benchmark(bpf.censor, test_str)', globals=globals(), number=1)

    print(f"Regex: {regex_runtime}")
    print(f"Current: {current_runtime}")

which has the following output:

# in seconds
Regex: 26.9981138
Current: 0.0000009468

arhankundu99 · 2020-09-14T15:28:45Z

Thank you so much for the explanation!

snguyenthanh added the question Further information is requested label Sep 14, 2020

arhankundu99 closed this as completed Sep 14, 2020

snguyenthanh mentioned this issue Nov 27, 2020

profanity filtering doesn't work for combined words like "f*ckme" or "suckmyd*ck" #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding use of regex #14

Regarding use of regex #14

arhankundu99 commented Sep 12, 2020

snguyenthanh commented Sep 14, 2020 •

edited

Loading

arhankundu99 commented Sep 14, 2020

Regarding use of regex #14

Regarding use of regex #14

Comments

arhankundu99 commented Sep 12, 2020

snguyenthanh commented Sep 14, 2020 • edited Loading

arhankundu99 commented Sep 14, 2020

snguyenthanh commented Sep 14, 2020 •

edited

Loading