This is an end to end machine learning system for predicting the failure of the truck engine component Air Pressure System. This entire solution is built using AWS Services like AWS S3 buckets (for storing the data), AWS DynamoDB (for logging and improvising the system performance), AWS Elastic Container Registry (for storing the container images), and AWS Elastic Container Service (for running the container image). Apart from AWS services, MLFlow was used for experiment tracking and model versioning and model staging with artifacts stored in AWS S3 bucket. Docker for containerization of application.
To build a classification methodology to predict the failure of Air Pressure System on the basis of given set of predictors.
In the first place, whenever we start a machine learning project, we need to sign a data sharing agreement with the client, where sign off some of the parameters like,
1. Format of data - like csv format or json format,etc
2. Number of Columns
3. Length of date stamp in the file
4. Length of time stamp in the file
5. DataType of each sensor - like float,int,string
Once the data sharing agreement,is created we create a master data management, which is nothing but the schema_training.json and schema_prediction.json file. Using this data we shall validate the batch data which is sent to us.
The data which is sent to us will be stored in AWS S3 buckets. From AWS S3 buckets, using schema file, the data is validated againist filename, column length, and missing values in the column.
Once the data validation is done, the data transformation pipeline is triggered, where we add quotes to string values in the data.After the data transformation is done, The good data is stored is stored in MongoDB and once is stored in database, we will export a csv file which will be used training for the models.
The model training is done by using a customized machine learning approach,in which the entire training data is divided to clusters using KMeans algorithm, and for every cluster of data, a model is trained and then model is used for prediction. So before we apply a clustering algorithm to the data, we need to preprocess the data as done in the jupyter notebook like missing values, replacing invalid values. Then elbow plot is created and number of clusters is created and based on the number of clusters XGBoost model and Random Forest Model are trained are saved in AWS S3 buckets.
Once the models are trained,they are tested againist the test data and model score is found out.Now MLFlow is used for logging the parameters,metrics and models to the server. Once the logging of parameters,metrics and models is done. A load production model is triggered to which will get the top models based on metrics and then transitioned to production or staging depending on the condition.
The solution application is exposed as API using FastAPI and application is dockerized using Docker. MLFlow setup is done in an AWS Virtual Machine. CI-CD pipeline is created which will deploy the application to AWS Container Instances, whenever new code is commmited to GitHub.
- Python
- Sklearn for machine learning algorithms
- FastAPI for creating an web application
- Machine Learning
- MLFlow for experiment tracking,model versioning and model staging.
- SQL-Lite as backend store for MLFlow server
- AWS EC2 instances for deploying the MLFlow server
- AWS S3 buckets for data storage
- AWS DynamoDB for storing the logging data
- MongoDB Atlas for database operations
- AWS Elastic Container Registry for storing the container images
- AWS Container Service for running the container applications
- Random Forest Classifier Model
- XGBoost Classifier Model
- Accuracy
- ROC AUC score
- CI-CD deployment to AWS Elastic Container Service