- Goal : Convert pytorch model to TensorRT int8 model using by ONNX for use in C++ code.
- Process : Pytorch model(python) -> ONNX -> TensorRT Model(C++) -> TensorRT PTQ INT8 Model(C++)
- Process2 : Pytorch model(python) -> ONNX -> TensorRT Model(python)
- Sample Model : Resnet18
- Device
- Windows 10 laptop
- CPU i7-11375H
- GPU RTX-3060
- Dependency
- cuda 11.4.1
- cudnn 8.4.1
- tensorrt 8.4.3
- pytorch 1.13.1+cu116
- onnx 1.13.0
- onnxruntime-gpu 1.14.0
TensorRT_ONNX/
├── calib_data/ # 100 images for ptq
├── data/ # input image
├── Pytorch/
│ ├─ model/ # onnx, pth, wts files
│ ├─ 1_resnet18_torch.py # base pytorch model
│ ├─ 2_resnet18_onnx_runtime.py # make onnx & onnxruntime model
│ ├─ 3_resnet18_onnx.py # make onnx for TRT
│ ├─ 4_resnet18_gen_wts.py # make weight(.wts) for api TRT model
│ ├─ 5_resnet18_trt.py # make TRT model using python tensorrt api
│ ├─ common.py # for 5_resnet18_trt.py
│ └─ utils.py
├── TensorRT_ONNX/
│ ├─ Engine/ # engine file & calibration cach table
│ ├─ TensorRT_ONNX/
│ │ ├─ calibrator.cpp # for ptq
│ │ ├─ calibrator.hpp
│ │ ├─ logging.hpp
│ │ ├─ main.cpp # main code
│ │ ├─ utils.cpp # custom util functions
│ │ └─ utils.hpp
│ └─ TensorRT_ONNX.sln
├── LICENSE
└── README.md
- Comparison of calculation average execution time of 100 iteration and FPS, GPU memory usage for one image [224,224,3]
Pytorch | ONNX-RT | TensorRT | TensorRT | TensorRT | |
Precision | FP32 | FP32 | FP32 | FP16 | Int8(PTQ) |
Avg Duration time [ms] | 3.68 ms | 2.52 ms | 1.32 ms | 0.56 ms | 0.41 ms |
FPS [frame/sec] | 271.14 fps | 396.47 fps | 757.00 fps | 1797.6 fps | 2444.9 fps |
Memory [GB] | 1.58 GB | 1.18 GB | 0.31 GB | 0.27 GB | 0.25 GB |