Grammarly

ABOUT THE PROJECT

The goal of the project is to understand and build a End-To-End MLOps lifecycle from model building, monitoring, configurations, testing, packaging, deployment to CI/CD, etc.

Part 1: Deep Learning Project

The project that've implemented is a simple deep learning model which predicts whether a given sentence is gramatically correct or not.

Input data : 'glue'
model : 'google/bert_uncased_L-2_H-128_A-2'
output format : 'Probabilities [correct, incorrect]'
framework : 'Pytorch-Lightning', 'Huggingface Datasets' , 'Huggingfacemodels'

Following tech stack is used:

Part 2: Model monitoring - Weights and Biases

Weight and Biases alone can be used to do many different Mlops tasks like model monitoring, hyperparameter tracking, model and data versioning etc. But I've used only for tracking model training.

plotted model training
confusion matrix
a table displaying incorrectly classified datapoints

References:

Part 3: Configurations - Hydra

Configuration management is a necessary for managing complex software systems. Lack of configuration management can cause serious problems with reliability, uptime, and the ability to scale a system. Configuration management helps to scale our project with much hassle and have all the tuning requirements at hand.

configured model hyperparams

References

Hydra Documentation
Kaggle notebooks/ Competitions solutions

Part 4: Data Version Control - DVC

Versioning platforms like github can't be used to version large files like model and dataset. This is where DVC(data version control) comes into picture. With DVC we can easily keep track/ version our model and dataset, which itself can be stored at different location like AWS S3, or as simple as a google drive.

Part 5: Model Packaging - ONNX

Why do we need model packaging? Models can be built using any machine learning framework available out there (sklearn, tensorflow, pytorch, etc.). We might want to deploy models in different environments like (mobile, web, raspberry pi) or want to run in a different framework (trained in pytorch, inference in tensorflow). A common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers will help a lot.

This is acheived by a community project ONNX.

References

Prt 6: Model Packaging - Docker

A lot of newbie engineers complain that the code doesnt run on their computer but it runs on code-owners computer. This can be because of dependency issue or OS related issues. That is why to share a project we can just package the project and easily share it with others without any dependency problems. Dockers comes in handy for every data scientists because a data science can require a lot of lib to be installed in order to run a project.

So for others to run the applications they have to set up the same environment as it was run on the host side which means a lot of manual configuration and installation of components.

The solution to these limitations is a technology called Containers.

By containerizing/packaging the application, we can run the application on any cloud platform to get advantages of managed services and autoscaling and reliability, and many more.

The most prominent tool to do the packaging of application is Docker

References

Analytics vidhya blog

part 7: CI/CD - GitHub Actions

CI/CD is a method to frequently deliver apps to customers by introducing automation into the stages of app development.

Data Science is a itertive process where we have to keep updating our model and dataset. So whenever a new model is trained and if it performs better than previous version then it has to be replaced with better performing model. Github Actions helps us in automating many intermediate tasks which if done manually would eat up a lot of data scientists time.

References

Part 8: Container Registry - AWS ECR

A container registry is a place to store container images. A container image is a file comprised of multiple layers which can execute applications in a single instance. Hosting all the images in one stored location allows users to commit, identify and pull images when needed.

Amazon Simple Storage Service (S3) is a storage for the internet. It is designed for large-capacity, low-cost storage provision across multiple geographical regions.

Part 9: Serverless Deployment - AWS Lambda

----------------------- Coming Soon ----------------------------

Part 10: Prediction Monitoring - Kibana

----------------------- Coming Soon ----------------------------

Reference :

Raviraja Ganta

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.dvc		.dvc
.github/workflows		.github/workflows
__pycache__		__pycache__
configs		configs
images		images
logs/cola/version_1		logs/cola/version_1
models		models
outputs/2022-10-17		outputs/2022-10-17
wandb		wandb
.dvcignore		.dvcignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
convert_model_to_onnx.py		convert_model_to_onnx.py
data.py		data.py
docker-compose.yml		docker-compose.yml
lambda_handler.py		lambda_handler.py
model.py		model.py
predict.py		predict.py
predict_onnx.py		predict_onnx.py
requirements.txt		requirements.txt
train.py		train.py
trained_model.dvc		trained_model.dvc
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grammarly

ABOUT THE PROJECT

Part 1: Deep Learning Project

Part 2: Model monitoring - Weights and Biases

Part 3: Configurations - Hydra

Part 4: Data Version Control - DVC

Part 5: Model Packaging - ONNX

Prt 6: Model Packaging - Docker

part 7: CI/CD - GitHub Actions

Part 8: Container Registry - AWS ECR

Part 9: Serverless Deployment - AWS Lambda

Part 10: Prediction Monitoring - Kibana

Reference :

About

Releases

Packages

Languages

o-Senpai-o/Grammarly

Folders and files

Latest commit

History

Repository files navigation

Grammarly

ABOUT THE PROJECT

Part 1: Deep Learning Project

Part 2: Model monitoring - Weights and Biases

Part 3: Configurations - Hydra

Part 4: Data Version Control - DVC

Part 5: Model Packaging - ONNX

Prt 6: Model Packaging - Docker

part 7: CI/CD - GitHub Actions

Part 8: Container Registry - AWS ECR

Part 9: Serverless Deployment - AWS Lambda

Part 10: Prediction Monitoring - Kibana

Reference :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages