RedactNLP: Redact Your PDF!

RedactNLP is a tool that allows you to automatically redact sensitive information from PDF documents using natural language processing and computer vision techniques.

Visit RedactNLP Spaces for the hosted demo on Huggingface Spaces.

How Redaction Works

PDF to Images: The PDF pages are converted into images.
Text Extraction: Using EasyOCR, text is extracted from the images.
Entity Identification: The "dslim/distilbert-NER" model classifies tokens in the extracted text to identify sensitive elements such as names, locations, or organizations.
Redaction: A non-recoverable mask is applied to all identified sensitive elements, ensuring that they cannot be recovered from the redacted document.

Features

Automatic redaction of sensitive information from PDFs.
Uses OCR to handle scanned or image-based PDFs.
Leverages state-of-the-art NLP models for entity recognition.
Ensures irreversible redaction of confidential information.

Installation

Clone the repository and install the required dependencies:

git clone https://github.com/mitanshu7/RedactNLP.git
cd RedactNLP
pip install -r requirements.txt

Usage

To redact a PDF:

python app.py

Then navigate to http://localhost:7860

Contributing

Feel free to open issues or submit pull requests if you would like to contribute to this project.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
craft_mlt_25k.pth		craft_mlt_25k.pth
english_g2.pth		english_g2.pth
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RedactNLP: Redact Your PDF!

How Redaction Works

Features

Installation

Usage

Contributing

License

About

Releases

Packages

Languages

mitanshu7/RedactNLP

Folders and files

Latest commit

History

Repository files navigation

RedactNLP: Redact Your PDF!

How Redaction Works

Features

Installation

Usage

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages