-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add get censored words & censor middle only features #36
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you could write some tests for the functions you added as well.
better_profanity/better_profanity.py
Outdated
@@ -53,7 +54,7 @@ def __init__(self, words=None): | |||
|
|||
## PUBLIC ## | |||
|
|||
def censor(self, text, censor_char="*"): | |||
def censor(self, text, censor_char="*", middle_only=False, get_censored_words=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to have a separate get_censored_words(text)
function, as it is not obviously clear the returned result of censor(get_censored_words=True)
is. You could share some/most of the code with censor
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I thought it would be better to not repeat the same code, but it's true that it would potentially cause confusion. I'll work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great. You are right that it would be better to not repeat the same code. But we could also create a function that is used by both censor
and get_censored_words
to determine which words are profane.
better_profanity/utils.py
Outdated
@@ -18,8 +18,14 @@ def read_wordlist(filename: str): | |||
yield row | |||
|
|||
|
|||
def get_replacement_for_swear_word(censor_char): | |||
return censor_char * 4 | |||
def get_replacement_for_swear_word(censor_char, n=4): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just my personal preference: could you replace n
with a more detailed variable ? It's quite unclear what n
is when we call get_replacement_for_swear_word("-", n=2)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Separated the functions as you said but while writing unit tests and testing edge cases, I just realized that the current Example with get_censored_words: bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.get_censored_words(bad_text)
>>>['shit', 'bullshit']
# It completely ignored "Fuck" since they're merged
bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.get_censored_words(bad_text)
>>>['wh0re', 'H@nD']
# It didn't include "j0b" since they're separated with space Example with middle_only (same issues): bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.censor(bad_text, middle_only=True)
>>>"Dude, I hate s**t b******t."
# It completely ignored "Fuck" since they're merged
bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.censor(bad_text, middle_only=True)
>>>"That w***e gave m3 a very good H**D."
# It didn't include "j0b" since they're separated with space To solve that, I simply put a check before merging swear words (only merge if: Example with get_censored_words: bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.get_censored_words(bad_text)
>>>['shit', 'Fuck', 'bullshit']
bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.get_censored_words(bad_text)
>>>['wh0re']
# It didn't include "H@nD j0b" Example with middle_only (same issues): bad_text = "Dude, I hate shit. Fuck bullshit."
profanity.censor(bad_text, middle_only=True)
>>>"Dude, I hate s**t. F**k b******t."
bad_text = "That wh0re gave m3 a very good H@nD j0b."
profanity.censor(bad_text, middle_only=True)
>>>"That w***e gave m3 a very good H@nD j0b."
# It didn't include "H@nD j0b" Maybe we should just follow this method and warn users of these possible issues? I think it's a pretty mild edge case anyway, but it's up to you. |
Provided a solution for the issue #34. Sorry I kind of messed up with branches so this commit is merged with the other PR I created (#35).
Again, it doesn't break anything and can only be used if
get_censored_words
isTrue
It basically returns a Tuple of
(str, list)
with thestr
being the original censored text and thelist
being the list of censored words.Usage: