profanity-filter

Most of the words which are in the profane_wordlist.txt are taken from Bad Words list for Facebook.
Supports modified spellings like D@mn, $h1t etc.
This library is significantly faster than other profanity filters which use regex or string methods.

Reason to use trie: https://link.medium.com/tMuykUJZJ9
Reason to not use regex: snguyenthanh/better_profanity#14

The filter also censors words if their prefixes match with any profane word.

Working

import profanity_filter
filter = profanity_filter.ProfanityFilter()
clean_text = filter.censor("D*mnn you!")
print(clean_text) 
# ***** you!

All modified spellings of profane words will be detected Example: D*mn, D@mn, $h17, 4r53 etc

Add your custom profane wordlist and custom whitelist

filter.load_profane_words(custom_profane_wordlist = {'damn', 'douche'}, whitelist = {'shit'})

Check if your text has any profane word

filter.isProfane('You piece of $h*t')
# returns true

How this profanity filter works for text words

        self.CHARS_MAPPING = {
            "a": ("a", "@", "*", "4"),
            "i": ("i", "*", "l", "1"),
            "o": ("o", "*", "0", "@"),
            "u": ("u", "*", "v"),
            "v": ("v", "*", "u"),
            "l": ("l", "1"),
            "e": ("e", "*", "3"),
            "s": ("s", "$", "5"),
            "t": ("t", "7")
        }

This map maps characters with set of similar looking alphabets. Using commonly used profane wordlist and this map, Distorted profane words (Leetspeak words) are generated and the generated words are inserted into a trie.

The wordlist generated contains a total of approximately 40000 words, including 130 words from the default profanity_wordlist.txt and their variants by modified spellings.

Time Complexity to check whether a word is profane is O(length of the word).

Add more profane words

filter.add_profane_words(['damn', 'shit'])

Add more whitelist words

filter.add_whitelist_words(['damn', 'shit'])

Censor profane urls

filter.censor_url(url)

Check whether your image is profane or not

r = filter.get_image_analysis(IMAGE_URL)
print(r.json())
# json output which contains profanity_score of the image and other details

This is done with the help of DeepAI Api
https://deepai.org/machine-learning-model/nsfw-detector

Censor your profane image

filter.censor_image(image_url)

This is done with the help of pillow library which is a Photo imaging library
https://pypi.org/project/Pillow/
The censored images are stored in the images folder.

TO-DO

Implement Compressed trie instead of normal trie for space optimization.
Censor words whose inner substrings match with profane words while avoiding false positives.
Add support for adding wordlist as a file.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.idea		.idea
data		data
images		images
venv		venv
README.md		README.md
profanity_filter.py		profanity_filter.py
requirements.txt		requirements.txt
tests.py		tests.py
trie.py		trie.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

profanity-filter

Working

Add your custom profane wordlist and custom whitelist

Check if your text has any profane word

How this profanity filter works for text words

Add more profane words

Add more whitelist words

Censor profane urls

Check whether your image is profane or not

Censor your profane image

TO-DO

About

Releases

Packages

Languages

arhankundu99/profanity-filter

Folders and files

Latest commit

History

Repository files navigation

profanity-filter

Working

Add your custom profane wordlist and custom whitelist

Check if your text has any profane word

How this profanity filter works for text words

Add more profane words

Add more whitelist words

Censor profane urls

Check whether your image is profane or not

Censor your profane image

TO-DO

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages