This project demonstrates how to run C API applications using Triton Inference Server as a shared library. We also show how to build and execute such applications on Jetson.
- JetPack >= 4.6
- OpenCV >= 4.1.1
- TensorRT >= 8.0.1.6
Follow the installation instructions from the GitHub release page (https://github.com/triton-inference-server/server/releases/).
In our example, we placed the contents of downloaded release directory under /opt/tritonserver
.
The purpose of the sample located under concurrency_and_dynamic_batching is to demonstrate the important features of Triton Inference Server such as concurrent model execution and dynamic batching. In order to do that, we implemented a people detection application using C API and Triton Inference Server as a shared library.
To analyze model performance on Jetson,
perf_analyzer
tool is used. The perf_analyzer
is included in the release tar file or can be
compiled from source.
From this directory of the repository, execute the following to evaluate model performance:
./perf_analyzer -m peoplenet -b 2 --service-kind=triton_c_api --model-repo=$(pwd)/concurrency_and_dynamic_batching/trtis_model_repo_sample_1 --triton-server-directory=/opt/tritonserver --concurrency-range 1:6 -f perf_c_api.csv
In the example above we saved the results as a .csv
file. To visualize these
results, follow the steps described
here.