In this project, we aimed to enhance the quality of the dashcam and monitor videos without costly upgrades. Using object detection and super-resolution techniques, we explored identifying and improving the visual details of cars or persons within low-quality frames.
We trained a YOLOv7 model on the Cityscapes dataset (convert to COCO format using cityscapes-to-coco-conversion
) to detect objects of interest. Additionally, we incorporated Latent Diffusion Models (LDM) for super-resolution to further enhance the cropped regions.
- Python 3.10.11
- Pytorch 1.13.1
- Torchvision 0.14.1
- CUDA 11.7
-
Clone the project and its submodules
$ git clone --recurse-submodules https://github.com/ghnmqdtg/yolov7-on-cityscapes-with-bbox-cropping.git
-
Go into the project folder
$ cd yolov7-on-cityscapes-with-bbox-cropping
-
Run
./scripts/setup_env.sh
to setup the env.$ sh scripts/setup_env.sh
-
Create a conda env named
yolov7_with_cropping
with python 3.10.11. -
Install pytorch with cuda 11.7.
-
Install the dependencies.
-
-
(Optional) Change VSCode interpreter path with
~/.conda/envs/yolov7_with_cropping/bin/python
. -
Modify the
./scripts/setup_dataset.sh
line 5 with your cityscapes username and password. -
Run
./scripts/setup_dataset.sh
to setup the env; this takes some time.$ sh scripts/setup_dataset.sh
-
Download the dataset.
-
Use
cityscapes-to-coco-conversion
to generate bbox annotations of Cityscapes dataset using segmentation annotations. (Cityscapes has no bbox annotations). -
Convert annotations from COCO format to YOLO format.
-
-
Download the pretrained model and put it to
./yolov7
folder.$ wget https://github.com/ghnmqdtg/yolov7-on-cityscapes-with-bbox-cropping/releases/download/v0.1/yolov7_cityscapes.pt \ -O ./yolov7/yolov7_cityscapes.pt
We provide web interface to test the model. You can use the following command to start the web server.
-
Put your street view video in
./www
, and rename it tostreet_view.mp4
. -
Start the backend server on a terminal
$ cd yolov7 $ python detect-web.py
-
Start the front-end on the other terminal
$ cd www $ sh launch.sh
-
Go to http://localhost:30700/
-
You should
cd
toyolov7
folder first$ cd yolov7
-
Train the model with cityscapes
$ python -m torch.distributed.launch \ --nproc_per_node 1 \ --master_port 9527 \ train.py \ --workers 2 \ --device 0 \ --sync-bn \ --epochs 100 \ --batch-size 32 \ --data data/cityscape.yaml \ --img 640 640 \ --cfg cfg/training/yolov7.yaml \ --weights ./yolov7.pt \ --hyp data/hyp.scratch.p5.yaml
The output will be saved in
runs/train
. -
Evaluation
$ python test.py \ --data data/cityscape.yaml \ --img 640 \ --batch 32 \ --conf 0.001 \ --iou 0.65 \ --device 0 \ --weights yolov7_cityscapes.pt \ --name cityscapes_yolo_cityscapes
The output will be saved in
runs/test
.
-
On single image
Only save the cropped region of width or height greater than 32px. Because if the region is too small, it will lead super resolution to generate the obvious artifact. The output will be saved in
runs/detect
.$ python detect.py \ --weights yolov7_cityscapes.pt \ --conf 0.25 \ --img-size 640 \ --source customdata/images/test/bonn/bonn_000004_000019_leftImg8bit.png \ --sr --sr-step 100
--sr
: Enable super resolution 4x.--sr-step
: Control the effect of super-resolution, the larger, the better.
Performance of Cropping & Super Resolution Crop Crop & SR 4x Crop Crop & SR 4x If you want to test super resolution only, you can use
utils/custom_features.py
atyolov7/
to do super resolution. If the width or height is larger than 150px, it will be resized to 150px and keep the aspect ratio first, then do super resolution.$ python utils/custom_features.py \ --input-img inference/images/cropped_car.jpg \ --sr-step 100
-
On a video
Nope, I haven't tried it yet.