A collection of Python scripts designed to streamline various tasks related to managing emails and PDF attachments.
Key Features:
- Email Text Extraction: Extract clean and formatted text from HTML email files.
- Email Classification: Classify emails as either automated or human-generated using advanced Natural Language Processing (NLP) techniques. The classification model is trained using Support Vector Machines (SVM) and TF-IDF vectorization.
- PDF Extraction: Extract text and other data from PDF attachments.
- PDF Form Auto-fill: Automatically populate PDF form fields with saved user profile data.
pip3 install -U scikit-learn
pip3 install flask
pip3 install bs4
pip3 install joblib
pip3 install gunicorn
- Put a
data
folder inside the root of the project with two subfolders "1. automated" and "2. human", containing html email data. - Run the train.py script to train and save the model as "email_classification_model.joblib":
python3 train.py
- Run
flask run
in the main root, that will start a service inhttp://127.0.0.1:5000/
- Use
[POST] /predict
endpoint to predict new email
gunicorn --bind 127.0.0.1:8001 --workers 3 --bind unix:/home/ubuntu/email-ai-server/app.sock wsgi:app