YOLOv7 on Triton Inference Server

#Results

YOLOv7 on Triton Inference Server

Triton Inference Server takes care of model deployment with many out-of-the-box benefits, like a GRPC and HTTP interface, automatic scheduling on multiple GPUs, shared memory (even on GPU), dynamic server-side batching, health metrics and memory resource management.

Export TensorRT

#install onnx-simplifier not listed in general yolov7 requirements.txt
pip3 install onnx-simplifier 

# Pytorch Yolov7 -> ONNX with grid, EfficientNMS plugin and dynamic batch size
python export.py --weights ./yolov7.pt --grid --end2end --dynamic-batch --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640
# ONNX -> TensorRT with trtexec and docker
docker run -it --rm --gpus=all nvcr.io/nvidia/tensorrt:22.06-py3
# Copy onnx -> container: docker cp yolov7.onnx <container-id>:/workspace/
# Export with FP16 precision, min batch 1, opt batch 8 and max batch 8
./tensorrt/bin/trtexec --onnx=yolov7.onnx --minShapes=images:1x3x640x640 --optShapes=images:8x3x640x640 --maxShapes=images:8x3x640x640 --fp16 --workspace=4096 --saveEngine=yolov7-fp16-1x8x8.engine --timingCacheFile=timing.cache
# Test engine
./tensorrt/bin/trtexec --loadEngine=yolov7-fp16-1x8x8.engine
# Copy engine -> host: docker cp <container-id>:/workspace/yolov7-fp16-1x8x8.engine .

## Model Repository


```bash
# Create folder structure
mkdir -p triton-deploy/models/yolov7/1/
touch triton-deploy/models/yolov7/config.pbtxt
# Place model
mv yolov7-fp16-1x8x8.engine triton-deploy/models/yolov7/1/model.plan

Model Configuration

Minimal configuration for triton-deploy/models/yolov7/config.pbtxt:

name: "yolov7"
platform: "tensorrt_plan"
max_batch_size: 8
dynamic_batching { }

$ tree triton-deploy/
triton-deploy/
└── models
    └── yolov7
        ├── 1
        │   └── model.plan
        └── config.pbtxt

3 directories, 2 files

Start Triton Inference Server

docker run --gpus all --rm --ipc=host --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd)/triton-deploy/models:/models nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/models --strict-model-config=false --log-verbose 1

In the log you should see:

+--------+---------+--------+
| Model  | Version | Status |
+--------+---------+--------+
| yolov7 | 1       | READY  |
+--------+---------+--------+

Performance with Model Analyzer

See Triton Model Analyzer Documentation for more info.

Example test for 16 concurrent clients using shared memory, each with batch size 1 requests:

docker run -it --ipc=host --net=host nvcr.io/nvidia/tritonserver:22.06-py3-sdk /bin/bash

./install/bin/perf_analyzer -m yolov7 -u 127.0.0.1:8001 -i grpc --shared-memory system --concurrency-range 16

# Result (truncated)
Concurrency: 16, throughput: 590.119 infer/sec, latency 27080 usec

```bash
# Result (truncated)
Concurrency: 16, throughput: 335.587 infer/sec, latency 47616 usec

$ python3 client.py --help
usage: client.py [-h] [-m MODEL] [--width WIDTH] [--height HEIGHT] [-u URL] [-o OUT] [-f FPS] [-i] [-v] [-t CLIENT_TIMEOUT] [-s] [-r ROOT_CERTIFICATES] [-p PRIVATE_KEY] [-x CERTIFICATE_CHAIN] {dummy,image,video} [input]

positional arguments:
  {dummy,image,video}   Run mode. 'dummy' will send an emtpy buffer to the server to test if inference works. 'image' will process an image. 'video' will process a video.
  input                 Input file to load from in image or video mode

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Inference model name, default yolov7
  --width WIDTH         Inference model input width, default 640
  --height HEIGHT       Inference model input height, default 640
  -u URL, --url URL     Inference server URL, default localhost:8001
  -o OUT, --out OUT     Write output into file instead of displaying it
  -f FPS, --fps FPS     Video output fps, default 24.0 FPS
  -i, --model-info      Print model status, configuration and statistics
  -v, --verbose         Enable verbose client output
  -t CLIENT_TIMEOUT, --client-timeout CLIENT_TIMEOUT
                        Client timeout in seconds, default no timeout
  -s, --ssl             Enable SSL encrypted channel to the server
  -r ROOT_CERTIFICATES, --root-certificates ROOT_CERTIFICATES
                        File holding PEM-encoded root certificates, default none
  -p PRIVATE_KEY, --private-key PRIVATE_KEY
                        File holding PEM-encoded private key, default is none
  -x CERTIFICATE_CHAIN, --certificate-chain CERTIFICATE_CHAIN
                        File holding PEM-encoded certicate chain default is none

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
aws		aws
detect		detect
google_app_engine		google_app_engine
images		images
labels		labels
train		train
wandb_logging		wandb_logging
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
activations.py		activations.py
add_nms.py		add_nms.py
autoanchor.py		autoanchor.py
boundingbox.py		boundingbox.py
client.py		client.py
coco.yaml		coco.yaml
common.py		common.py
custom.yaml		custom.yaml
datasets.py		datasets.py
detect.py		detect.py
experimental.py		experimental.py
export.py		export.py
general.py		general.py
google_utils.py		google_utils.py
hubconf.py		hubconf.py
hyp.scratch.custom.yaml		hyp.scratch.custom.yaml
hyp.scratch.p5.yaml		hyp.scratch.p5.yaml
hyp.scratch.p6.yaml		hyp.scratch.p6.yaml
hyp.scratch.tiny.yaml		hyp.scratch.tiny.yaml
labels.cache		labels.cache
labels.py		labels.py
loss.py		loss.py
metrics.py		metrics.py
plots.py		plots.py
processing.py		processing.py
render.py		render.py
test.py		test.py
torch_utils.py		torch_utils.py
train.py		train.py
train_aux.py		train_aux.py
yolo.py		yolo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YOLOv7 on Triton Inference Server

Export TensorRT

Model Configuration

Start Triton Inference Server

Performance with Model Analyzer

About

Releases

Packages

Languages

License

dariak153/yolo_detect_candy

Folders and files

Latest commit

History

Repository files navigation

YOLOv7 on Triton Inference Server

Export TensorRT

Model Configuration

Start Triton Inference Server

Performance with Model Analyzer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages