- Standard format for expressing machine learning algorithms and models
- More details about ONNX: https://en.wikipedia.org/wiki/Open_Neural_Network_Exchange
- NVIDIA SDK for high-performance deep learning inference
- Deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications
- Explicit batch is required when you are dealing with Dynamic shapes, otherwise network will be created using implicit batch dimension.
- More details about TensorRT: https://blog.naver.com/qbxlvnf11/222403199156
- Details: https://blog.naver.com/qbxlvnf11/222342675767
- Export & load onnx
- Inference onnx
- Compare output and time efficiency between onnx and pytorch
- Setting batch size of input data: explicit batch or implicit batch
- Build & load TensorRT engine
- Setting batch size of input data: explicit batch or implicit batch
- Key trtexec options
- Precision of engine: FP32, FP16
- optShapes: set the most used input data size of model for inference
- minShapes: set the max input data size of model for inference
- maxShapes: set the min input data size of model for inference
- Inference TensorRT engine
- Compare output and time efficiency among tensorrt and onnx and pytorch
docker pull qbxlvnf11docker/tensorrt_21.08
nvidia-docker run -it -p 9000:9000 -e GRANT_SUDO=yes --user root --name tensorrt_21.08_env -v {code_folder_path}:/workspace -w /workspace qbxlvnf11docker/tensorrt_21.08:latest bash
- Converting Pytorch model to onnx
python convert_pytorch_to_onnx/convert_pytorch_to_onnx.py --dynamic_axes True --output_path onnx_output_explicit.onnx --batch_size {batch_size}
- Converting onnx to TensorRT and test time efficiency (FP32)
- Setting three parameters (minShapes, optShapes, maxShapes) according to the inference environment
python convert_onnx_to_tensorrt/convert_onnx_to_tensorrt.py --dynamic_axes True --onnx_model_path onnx_output_explicit.onnx --batch_size {batch_size} --tensorrt_engine_path FP32_explicit.engine --engine_precision FP32
- Converting onnx to TensorRT and test time efficiency (FP16)
- Setting three parameters (minShapes, optShapes, maxShapes) according to the inference environment
python convert_onnx_to_tensorrt/convert_onnx_to_tensorrt.py --dynamic_axes True --onnx_model_path onnx_output_explicit.onnx --batch_size {batch_size} --tensorrt_engine_path FP16_explicit.engine --engine_precision FP16
- Converting Pytorch model to onnx
python convert_pytorch_to_onnx/convert_pytorch_to_onnx.py --dynamic_axes False --output_path onnx_output_implicit.onnx --batch_size {batch_size}
- Converting onnx to TensorRT and test time efficiency (FP32)
python convert_onnx_to_tensorrt/convert_onnx_to_tensorrt.py --dynamic_axes False --onnx_model_path onnx_output_implicit.onnx --batch_size {batch_size_of_implicit_batch_onnx_model} --tensorrt_engine_path FP32_implicit.engine --engine_precision FP32
- Converting onnx to TensorRT and test time efficiency (FP16)
python convert_onnx_to_tensorrt/convert_onnx_to_tensorrt.py --dynamic_axes False --onnx_model_path onnx_output_implicit.onnx --batch_size {batch_size_of_implicit_batch_onnx_model} --tensorrt_engine_path FP16_implicit.engine --engine_precision FP16
-
Explicit batch test of FP32 TensorRT engine
- Batch size of inf data = 1
- Batch size of optShapes = 1
-
Explicit batch test of FP16 TensorRT engine
- Batch size of inf data = 1
- Batch size of optShapes = 1
-
Explicit batch test of FP16 TensorRT engine
- Batch size of inf data = 8
- Batch size of optShapes = 1
-
Implicit batch test of FP32 TensorRT engine
- Batch size of inf data = 1
-
Implicit batch test of FP16 TensorRT engine
- Batch size of inf data = 1
https://pytorch.org/docs/stable/onnx.html
https://developer.nvidia.com/tensorrt
https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_21-08.html#rel_21-08
https://www.kaggle.com/ifigotin/imagenetmini-1000