YoloV7 Quantization Aware Training

Description

We use TensorRT's pytorch quntization tool to finetune training QAT yolov7 from the pre-trained weight, then export the model to onnx and deploy it with TensorRT. The accuray and performance can be found in below table.

Method	Calibration method	mAP^val 0.5	mAP^val 0.5:0.95	batch-1 fps Jetson Orin-X	batch-16 fps Jetson Orin-X	weight
pytorch FP16	-	0.6972	0.5120	-	-	yolov7.pt
pytorch PTQ-INT8	Histogram(MSE)	0.6957	0.5100	-	-	yolov7_ptq.pt yolov7_ptq_640.onnx
pytorch QAT-INT8	Histogram(MSE)	0.6961	0.5111	-	-	yolov7_qat.pt
TensorRT FP16	-	0.6973	0.5124	140	168	yolov7.onnx
TensorRT PTQ-INT8	TensorRT built in EntropyCalibratorV2	0.6317	0.4573	207	264	-
TensorRT QAT-INT8	Histogram(MSE)	0.6962	0.5113	207	266	yolov7_qat_640.onnx

network input resolution: 3x640x640
note: trtexec cudaGraph is enabled

How To QAT Training

1.Setup

Suggest to use docker environment.

$ docker pull nvcr.io/nvidia/pytorch:22.09-py3

Clone and apply patch

# use this YoloV7 as a sample base 
git clone https://github.com/WongKinYiu/yolov7.git
cp -r yolov_deepstream/yolov7_qat/* yolov7/

Install dependencies

$ pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com

Download dataset and pretrained model

$ bash scripts/get_coco.sh
$ wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt

2. Start QAT training

$ python scripts/qat.py quantize yolov7.pt --ptq=ptq.pt --qat=qat.pt --eval-ptq --eval-origin

This script includes steps below:

Insert Q&DQ nodes to get fake-quant pytorch model
Pytorch quntization tool provides automatic insertion of QDQ function. But for yolov7 model, it can not get the same performance as PTQ, because in Explicit mode(QAT mode), TensorRT will henceforth refer Q/DQ nodes' placement to restrict the precision of the model. Some of the automatic added Q&DQ nodes can not be fused with other layers which will cause some extra useless precision convertion. In our script, We find Some rules and restrictions for yolov7, QDQ nodes are automatically analyzed and configured in a rule-based manner, ensuring that they are optimal under TensorRT. Ensuring that all nodes are running INT8(confirmed with tool:trt-engine-explorer, see scripts/draw-engine.py). for details of this part, please refer quantization/rules.py, About the guidance of Q&DQ insert, please refer Guidance_of_QAT_performance_optimization
PTQ calibration
After inserting Q&DQ nodes, we recommend to run PTQ-Calibration first. Per experiments, Histogram(MSE) is the best PTQ calibration method for yolov7. Note: if you are satisfied with PTQ result, you could also skip QAT.
QAT training
After QAT, need to finetune traning our model. after getting the accuracy we are satisfied, Saving the weights to files

3. Export onnx

$ python scripts/qat.py export qat.pt --size=640 --save=qat.onnx --dynamic

4. Evaluate model accuracy on coco

$ bash scripts/eval-trt.sh qat.pt

5. Benchmark

$ /usr/src/tensorrt/bin/trtexec --onnx=qat.onnx --int8 --fp16  --workspace=1024000 --minShapes=images:4x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:4x3x640x640

Quantization Yolov7-Tiny

$ python scripts/qat.py quantize yolov7-tiny.pt --qat=qat.pt --ptq=ptq.pt --ignore-policy="model\.77\.m\.(.*)|model\.0\.(.*)" --supervision-stride=1 --eval-ptq --eval-origin

Note

For YoloV5, please use the script scripts/qat-yolov5.py. This adds QAT support for Add operator, making it more performant.
Please refer to the quantize.replace_bottleneck_forward function to handle the Add operator.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

YoloV7 Quantization Aware Training

Description

How To QAT Training

1.Setup

2. Start QAT training

3. Export onnx

4. Evaluate model accuracy on coco

5. Benchmark

Quantization Yolov7-Tiny

Note

Files

README.md

Latest commit

History

README.md

File metadata and controls

YoloV7 Quantization Aware Training

Description

How To QAT Training

1.Setup

2. Start QAT training

3. Export onnx

4. Evaluate model accuracy on coco

5. Benchmark

Quantization Yolov7-Tiny

Note