24-2-mlops-project-car_object_detection

Introduction

TO DO

Dataset from:

https://universe.roboflow.com/openglpro/stanford_car

I used the YOLOv8 dataset

This dataset can detect both cars and bikes. I merged both train and test dataset using the script data/merge_train_test.py

Startup

Install all the requirements with:

pip install -r requirements.txt

Ensure Python version 3.12 is being used

Warning

If you would like to run with GPU, download CUDA Toolkit 12.6 https://developer.nvidia.com/cuda-downloads

Create .env file in the root of the repository

ROBOFLOW_API_KEY=""
AWS_ACCESS_KEY_ID=""
AWS_SECRET_ACCESS_KEY=""
AWS_REGION=""
AWS_LAMBDA_ROLE_ARN=""

Create a S3 bucket one bucket to store the ONNX model and another one to store all the datasets from the data versioning

python3 data/s3_bucket.py --bucket_model bucket-model-name --bucket_dataset bucket-dataset-name

This command will automatically save the bucket name in the .env file:

BUCKET_MODEL="bucket-model-name"
BUCKET_DATASET="bucket-dataset-name"

Add the following variables in the "Actions secrets and variables" section at settings

[!INFO]

ECR_NAME is the name of the ECR container BUCKET_MODEL is the name of the bucket were the model is stored

Steps for data versioning

Data versioning is a essencial step in any Machine Learning projects. It enables developer's teams to create multiple datasets and easily change between them when training. It is useful when the team have a lot of data and is trying to use only the samples that increase model performance. In this project dvc combined with git is used to implement this task. All datasets versions are stored at a S3 bucket.

Create a new data enviroment

Sometimes, it is necessary to start everything all again. The following steps show how can you do that:

Remove all tags already created (remote and local)

git push origin --delete $(git tag -l)

git tag -d $(git tag -l)

Ensure the tags were erased:

Run data.sh to create the file "data/data.zip" with your preprocessed data. Drop value is the ratio of the dowloaded dataset that will be erased.

./scripts/data.sh <drop_value>

Run configure_dvc.sh and pass as argument the Bucket created for the dataset

./scrips/configure_dvc.sh bucket-dataset-name

After that, you will have a tag v0.0.0 with the first version of the dataset!

Create a new dataset version

Everytime you want to create a new dataset version, run the steps bellow:

Do changes in the function prepocess from preprocess.py. Then, run data.sh:

Warning

Check if you are at main:

git checkout main

./scrips/data.sh <drop_value>

Run script that create new data version:

./scripts/new_dataset_version.sh vA.B.C

To use a specific data version:

git checkout vA.B.C
dvc checkout

Steps for training

Unzip data using the command:

unzip data/data.zip

Inside the Ultralytics folder, change it so runs are saved in the models folder of this repository.

cd /home/user/.config/Ultralytics

sudo vim settings.json

Do the following changes in settings.json:

"datasets_dir": "/home/user/your_path/24-2-mlops-project-car_object_detection",
"weights_dir": "/home/user/your_path/24-2-mlops-project-car_object_detection/models/weights",
"runs_dir": "/home/user/your_path/24-2-mlops-project-car_object_detection/models/runs",

In the root folder of the repository, start Mlflow:

mlflow ui --backend-store-uri ./models/runs/mlflow

In another terminal, train model:

cd src/

python3 train.py

This command will train the model and also save the best.onnx from the trained model inside the model S3 bucket. It will erase the file best.onnx from the bucket if it already exists. If you would like to use another YOLO model, you can run the following command (in the root of the repo):

python3 data/s3_bucket.py --file_path /absolute_train_path/weights/best.onnx

Train again, changing hyperparameters if necessary.

All runs will be saved in "models/runs"

Steps for deploying

For deploying the model do a git push to the main. Go to the section Actions in the repository to see all the details from the workflow

The API Endpoint can be found in:

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github/workflows		.github/workflows
app		app
data		data
deploy		deploy
docs		docs
logs		logs
models		models
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

24-2-mlops-project-car_object_detection

Introduction

Startup

Steps for data versioning

Create a new data enviroment

Create a new dataset version

Steps for training

Steps for deploying

About

Releases

Packages

Languages

insper-classroom/24-2-mlops-project-car_object_detection

Folders and files

Latest commit

History

Repository files navigation

24-2-mlops-project-car_object_detection

Introduction

Startup

Steps for data versioning

Create a new data enviroment

Create a new dataset version

Steps for training

Steps for deploying

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages