Mlops: Customer-Review-Sentiment-Analyser

Project Overview

This project aims to use Large language models and automate sentiment analysis for Amazon customer reviews using an MLOps pipeline that handles data ingestion, training, and deployment in a scalable and efficient way. The system is designed for continuous monitoring and retraining based on data drift.

This project leverages MLOps best practices to analyze Amazon customer reviews. Designed with scalability, efficiency, and automation in mind, it provides actionable insights for e-commerce sellers, enabling them to make data-driven decisions.

Architecture Diagram:

KEY Features

Data Pipeline

Automated data ingestion, preprocessing, and transformation using Apache Airflow.
Data versioning with DVC and storage in Google Cloud Storage (GCS).
Comprehensive schema validation and logging for robust data management.

Machine Learning Models

Fine-tuned DistilBERT, RoBERTa, and ALBERT models for sentiment classification.
Model selection based on performance metrics like accuracy, precision, recall, and F1-score.

Model Deployment

Seamless deployment of the best model (DistilBERT) using Vertex AI.
CI/CD pipeline implemented with Google Cloud Build and Jenkins for automated training, validation, and deployment.
Monitoring and drift detection with Grafana, Google Cloud Monitoring, and Evidently AI.

E-commerce Dashboard

Interactive Streamlit application hosted on Kubernetes.
Real-time sentiment analysis and visualization of product performance and customer sentiment trends.

Tech Stack

Data Pipeline: Apache Airflow, DVC, Google Cloud Storage.
Machine Learning: Transformers (Hugging Face), Tensorflow, DistilBERT, RoBERTa, ALBERT, Mlflow.
Deployment: Vertex AI, Kubernetes, Docker, Google Artifact Registry.
CI/CD: Jenkins.
Monitoring: Grafana, Google Cloud Monitoring, Evidently AI.
Frontend: Streamlit.

Project Structure

.
├── data_pipeline/                  # Data pipeline files and Airflow DAGs
├── model/                          # Model development and training scripts
│   ├── distilbert/
│   ├── roberta/
│   └── albert/
├── deployment/                     # Deployment configurations and CI/CD pipelines
├── Jenkinsfile                     # CI/CD pipeline definition
├── Dockerfile                      # Docker configuration for deployment
├── requirements.txt                # Python dependencies
└── README.md                       # Project documentation

Pipeline Workflow

Data Pipeline

Ingests Amazon review data from Hugging Face datasets.
Performs preprocessing steps such as:
Removing HTML tags, URLs, and invalid entries.
Merging review and metadata datasets.
Applying feature engineering techniques for sentiment labeling (heuristic, VADER, TextBlob).

Model Development

Fine-tunes DistilBERT, RoBERTa, and ALBERT using tokenized input data.
Tracks experiments and metrics with MLflow.
Bias detection with Fairlearn to ensure fair performance across product categories.

Model Deployment

Deploys the best model (DistilBERT) to Vertex AI for real-time inference. Automates deployment with GCP’s CI/CD pipelines and saves models to Google Artifact Registry. Monitors model performance and triggers retraining upon detecting data drift.

Monitoring and Alerts

Grafana Dashboards:
Sentiment metrics (accuracy, F1-score, AUC).
Sentiment trends and product performance.
Google Cloud Monitoring:
Tracks resource usage for Kubernetes, Cloud SQL, and Vertex AI endpoints.

Evidently AI:

Detects data drift and triggers retraining pipelines.

Setup and Installation

Prerequisites

Python 3.8+
Docker and Docker Compose
Google Cloud Platform (GCP) account
PostgreSQL database setup

Steps

Clone the repository:

git clone https://github.com/madhurima-vanga/Amazon-Customer-Sentiment-Analyser.git
cd Amazon-Customer-Sentiment-Analyser

Install dependencies:
```
pip install -r requirements.txt
```

Configure environment variables in .env:

DB_HOST=<your-database-host>
DB_PORT=<your-database-port>
DB_NAME=<your-database-name>
DB_USER=<your-database-user>
DB_PASSWORD=<your-database-password>
GOOGLE_APPLICATION_CREDENTIALS=<path-to-service-account-key.json>

Run the Streamlit app:

streamlit run deployment/app_deployment/app.py

Contributing

We welcome contributions! Fork the repository, create a feature branch, and submit a pull request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.dvc		.dvc
data_pipeline		data_pipeline
deployment		deployment
model		model
.dvcignore		.dvcignore
.gitignore		.gitignore
Architecture_diagram.png		Architecture_diagram.png
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mlops: Customer-Review-Sentiment-Analyser

Project Overview

Architecture Diagram:

KEY Features

Tech Stack

Project Structure

Pipeline Workflow

Setup and Installation

Prerequisites

Steps

Contributing

License

About

Releases

Packages

Contributors 5

Languages

License

madhurima-vanga/Amazon-Customer-Sentiment-Analyser

Folders and files

Latest commit

History

Repository files navigation

Mlops: Customer-Review-Sentiment-Analyser

Project Overview

Architecture Diagram:

KEY Features

Tech Stack

Project Structure

Pipeline Workflow

Setup and Installation

Prerequisites

Steps

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages