Skip to content

insper-classroom/24-2-mlops-project-car_object_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

24-2-mlops-project-car_object_detection

Introduction

TO DO

Dataset from:

https://universe.roboflow.com/openglpro/stanford_car

I used the YOLOv8 dataset

This dataset can detect both cars and bikes. I merged both train and test dataset using the script data/merge_train_test.py

Startup

  1. Install all the requirements with:
pip install -r requirements.txt

Ensure Python version 3.12 is being used

Warning

If you would like to run with GPU, download CUDA Toolkit 12.6 https://developer.nvidia.com/cuda-downloads

  1. Create .env file in the root of the repository
ROBOFLOW_API_KEY=""
AWS_ACCESS_KEY_ID=""
AWS_SECRET_ACCESS_KEY=""
AWS_REGION=""
AWS_LAMBDA_ROLE_ARN=""
  1. Create a S3 bucket one bucket to store the ONNX model and another one to store all the datasets from the data versioning
python3 data/s3_bucket.py --bucket_model bucket-model-name --bucket_dataset bucket-dataset-name

This command will automatically save the bucket name in the .env file:

BUCKET_MODEL="bucket-model-name"
BUCKET_DATASET="bucket-dataset-name"
  1. Add the following variables in the "Actions secrets and variables" section at settings

github env

[!INFO]

ECR_NAME is the name of the ECR container BUCKET_MODEL is the name of the bucket were the model is stored

Steps for data versioning

Data versioning is a essencial step in any Machine Learning projects. It enables developer's teams to create multiple datasets and easily change between them when training. It is useful when the team have a lot of data and is trying to use only the samples that increase model performance. In this project dvc combined with git is used to implement this task. All datasets versions are stored at a S3 bucket.

Create a new data enviroment

Sometimes, it is necessary to start everything all again. The following steps show how can you do that:

  1. Remove all tags already created (remote and local)
git push origin --delete $(git tag -l)

git tag -d $(git tag -l)
  • Ensure the tags were erased:

tags_erased

  1. Run data.sh to create the file "data/data.zip" with your preprocessed data. Drop value is the ratio of the dowloaded dataset that will be erased.
./scripts/data.sh <drop_value>
  1. Run configure_dvc.sh and pass as argument the Bucket created for the dataset
./scrips/configure_dvc.sh bucket-dataset-name

After that, you will have a tag v0.0.0 with the first version of the dataset!

Create a new dataset version

Everytime you want to create a new dataset version, run the steps bellow:

  1. Do changes in the function prepocess from preprocess.py. Then, run data.sh:

Warning

Check if you are at main:

git checkout main
./scrips/data.sh <drop_value>
  1. Run script that create new data version:
./scripts/new_dataset_version.sh vA.B.C
  1. To use a specific data version:
git checkout vA.B.C
dvc checkout

Steps for training

  1. Unzip data using the command:
unzip data/data.zip
  1. Inside the Ultralytics folder, change it so runs are saved in the models folder of this repository.
cd /home/user/.config/Ultralytics

sudo vim settings.json

Do the following changes in settings.json:

"datasets_dir": "/home/user/your_path/24-2-mlops-project-car_object_detection",
"weights_dir": "/home/user/your_path/24-2-mlops-project-car_object_detection/models/weights",
"runs_dir": "/home/user/your_path/24-2-mlops-project-car_object_detection/models/runs",
  1. In the root folder of the repository, start Mlflow:
mlflow ui --backend-store-uri ./models/runs/mlflow

empty_mlfow

  1. In another terminal, train model:
cd src/

python3 train.py

This command will train the model and also save the best.onnx from the trained model inside the model S3 bucket. It will erase the file best.onnx from the bucket if it already exists. If you would like to use another YOLO model, you can run the following command (in the root of the repo):

python3 data/s3_bucket.py --file_path /absolute_train_path/weights/best.onnx
  1. Train again, changing hyperparameters if necessary.

mlflow_working

  1. All runs will be saved in "models/runs"

mlflow_working_runs

Steps for deploying

For deploying the model do a git push to the main. Go to the section Actions in the repository to see all the details from the workflow

github_actions_working

The API Endpoint can be found in:

api_endpoint_actions

About

24-2-mlops-project-car_object_detection created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages