Skip to content

arhankundu99/profanity-filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

profanity-filter

Most of the words which are in the profane_wordlist.txt are taken from Bad Words list for Facebook.
Supports modified spellings like D@mn, $h1t etc.
This library is significantly faster than other profanity filters which use regex or string methods.

Reason to use trie: https://link.medium.com/tMuykUJZJ9
Reason to not use regex: snguyenthanh/better_profanity#14

The filter also censors words if their prefixes match with any profane word.

Working

import profanity_filter
filter = profanity_filter.ProfanityFilter()
clean_text = filter.censor("D*mnn you!")
print(clean_text) 
# ***** you!

All modified spellings of profane words will be detected Example: D*mn, D@mn, $h17, 4r53 etc

Add your custom profane wordlist and custom whitelist

filter.load_profane_words(custom_profane_wordlist = {'damn', 'douche'}, whitelist = {'shit'})

Check if your text has any profane word

filter.isProfane('You piece of $h*t')
# returns true

How this profanity filter works for text words

        self.CHARS_MAPPING = {
            "a": ("a", "@", "*", "4"),
            "i": ("i", "*", "l", "1"),
            "o": ("o", "*", "0", "@"),
            "u": ("u", "*", "v"),
            "v": ("v", "*", "u"),
            "l": ("l", "1"),
            "e": ("e", "*", "3"),
            "s": ("s", "$", "5"),
            "t": ("t", "7")
        }

This map maps characters with set of similar looking alphabets. Using commonly used profane wordlist and this map, Distorted profane words (Leetspeak words) are generated and the generated words are inserted into a trie.

The wordlist generated contains a total of approximately 40000 words, including 130 words from the default profanity_wordlist.txt and their variants by modified spellings.

Time Complexity to check whether a word is profane is O(length of the word).

Add more profane words

filter.add_profane_words(['damn', 'shit'])

Add more whitelist words

filter.add_whitelist_words(['damn', 'shit'])

Censor profane urls

filter.censor_url(url)

Check whether your image is profane or not

r = filter.get_image_analysis(IMAGE_URL)
print(r.json())
# json output which contains profanity_score of the image and other details

This is done with the help of DeepAI Api
https://deepai.org/machine-learning-model/nsfw-detector

Censor your profane image

filter.censor_image(image_url)

This is done with the help of pillow library which is a Photo imaging library
https://pypi.org/project/Pillow/
The censored images are stored in the images folder.

TO-DO

  1. Implement Compressed trie instead of normal trie for space optimization.
  2. Censor words whose inner substrings match with profane words while avoiding false positives.
  3. Add support for adding wordlist as a file.

About

A fast profanity filter for text and images

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages