Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs and examples to use Triton 23.06 #1037

Merged
merged 3 commits into from
Jul 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .devcontainer/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ services:
triton:
container_name: morpheus-triton
runtime: nvidia
image: nvcr.io/nvidia/tritonserver:22.10-py3
image: nvcr.io/nvidia/tritonserver:23.06-py3
command: tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false ${TRITON_MODEL_ARGS}
ports:
- 8000:8000
Expand Down
2 changes: 1 addition & 1 deletion docs/source/basics/building_a_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ This example shows an NLP Pipeline which uses several stages available in Morphe
#### Launching Triton
From the Morpheus repo root directory, run the following to launch Triton and load the `sid-minibert` model:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
```

#### Launching Kafka
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ From the root of the Morpheus project we will launch a Triton Docker container w
```shell
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models \
nvcr.io/nvidia/tritonserver:22.08-py3 \
nvcr.io/nvidia/tritonserver:23.06-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--log-info=true \
Expand Down
6 changes: 3 additions & 3 deletions docs/source/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ More advanced users, or those who are interested in using the latest pre-release
- NVIDIA driver `450.80.02` or higher
- [Docker](https://docs.docker.com/get-docker/)
- [The NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker)
- [NVIDIA Triton Inference Server](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver) `22.06` or higher
- [NVIDIA Triton Inference Server](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver) `23.06` or higher

> **Note about Docker:**
>
Expand Down Expand Up @@ -146,7 +146,7 @@ Many of the validation tests and example workflows require a Triton server to fu
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models \
nvcr.io/nvidia/tritonserver:22.08-py3 \
nvcr.io/nvidia/tritonserver:23.06-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--log-info=true \
Expand All @@ -160,7 +160,7 @@ Note: The above command is useful for testing out Morpheus, however it does load
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models \
nvcr.io/nvidia/tritonserver:22.08-py3 \
nvcr.io/nvidia/tritonserver:23.06-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--log-info=true \
Expand Down
4 changes: 2 additions & 2 deletions examples/abp_nvsmi_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,12 @@ This example utilizes the Triton Inference Server to perform inference.

Pull the Docker image for Triton:
```bash
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```

From the Morpheus repo root directory, run the following to launch Triton and load the `abp-nvsmi-xgb` XGBoost model:
```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model abp-nvsmi-xgb
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model abp-nvsmi-xgb
```

This will launch Triton and only load the `abp-nvsmi-xgb` model. This model has been configured with a max batch size of 32768, and to use dynamic batching for increased performance.
Expand Down
4 changes: 2 additions & 2 deletions examples/abp_pcap_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ To run this example, an instance of Triton Inference Server and a sample dataset

### Triton Inference Server
```bash
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```

##### Deploy Triton Inference Server
Expand All @@ -35,7 +35,7 @@ From the root of the Morpheus repo, navigate to the anomalous behavior profiling
cd examples/abp_pcap_detection

# Launch the container
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models --exit-on-error=false
docker run --rm --gpus=all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v $PWD/abp-pcap-xgb:/models/abp-pcap-xgb --name tritonserver nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models --exit-on-error=false
```

##### Verify Model Deployment
Expand Down
4 changes: 2 additions & 2 deletions examples/log_parsing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Pull Docker image from NGC (https://ngc.nvidia.com/catalog/containers/nvidia:tri
Example:

```bash
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```

##### Setup Env Variable
Expand All @@ -38,7 +38,7 @@ export MORPHEUS_ROOT=$(pwd)
From the Morpheus repo root directory, run the following to launch Triton and load the `log-parsing-onnx` model:

```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model log-parsing-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model log-parsing-onnx
```

##### Verify Model Deployment
Expand Down
4 changes: 2 additions & 2 deletions examples/nlp_si_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,10 @@ This example utilizes the Triton Inference Server to perform inference. The neur
From the Morpheus repo root directory, run the following to launch Triton and load the `sid-minibert` model:

```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
```

Where `22.02-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).
Where `23.06-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).

This will launch Triton and only load the `sid-minibert-onnx` model. This model has been configured with a max batch size of 32, and to use dynamic batching for increased performance.

Expand Down
4 changes: 2 additions & 2 deletions examples/ransomware_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Pull Docker image from NGC (https://ngc.nvidia.com/catalog/containers/nvidia:tri
Example:

```bash
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```
##### Setup Env Variable
```bash
Expand All @@ -39,7 +39,7 @@ Run the following from the `examples/ransomware_detection` directory to launch T

```bash
# Run Triton in explicit mode
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:22.08-py3 \
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models/triton-model-repo nvcr.io/nvidia/tritonserver:23.06-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--model-control-mode=explicit \
Expand Down
4 changes: 2 additions & 2 deletions examples/root_cause_analysis/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,10 @@ This example utilizes the Triton Inference Server to perform inference. The bina
From the Morpheus repo root directory, run the following to launch Triton and load the `root-cause-binary-onnx` model:

```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model root-cause-binary-onnx
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model root-cause-binary-onnx
```

Where `22.08-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).
Where `23.06-py3` can be replaced with the current year and month of the Triton version to use. For example, to use May 2021, specify `nvcr.io/nvidia/tritonserver:21.05-py3`. Ensure that the version of TensorRT that is used in Triton matches the version of TensorRT elsewhere (refer to [NGC Deep Learning Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)).

This will launch Triton and only load the model required by our example pipeline. The model has been configured with a max batch size of 32, and to use dynamic batching for increased performance.

Expand Down
2 changes: 1 addition & 1 deletion examples/sid_visualization/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ x-with-gpus: &with_gpus

services:
triton:
image: nvcr.io/nvidia/tritonserver:22.08-py3
image: nvcr.io/nvidia/tritonserver:23.06-py3
<<: *with_gpus
command: "tritonserver --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx --model-repository=/models/triton-model-repo"
environment:
Expand Down
2 changes: 1 addition & 1 deletion scripts/validation/val-globals.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ export e="\033[0;90m"
export y="\033[0;33m"
export x="\033[0m"

export TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:22.08-py3"}
export TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:23.06-py3"}

# TRITON_GRPC_PORT is only used when TRITON_URL is undefined
export TRITON_GRPC_PORT=${TRITON_GRPC_PORT:-"8001"}
Expand Down
2 changes: 1 addition & 1 deletion scripts/validation/val-utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ function wait_for_triton {

function ensure_triton_running {

TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:22.08-py3"}
TRITON_IMAGE=${TRITON_IMAGE:-"nvcr.io/nvidia/tritonserver:23.06-py3"}

IS_RUNNING=$(is_triton_running)

Expand Down
4 changes: 2 additions & 2 deletions tests/benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,14 @@ Pull Docker image from NGC (https://ngc.nvidia.com/catalog/containers/nvidia:tri
Example:

```
docker pull nvcr.io/nvidia/tritonserver:23.03-py3
docker pull nvcr.io/nvidia/tritonserver:23.06-py3
```

##### Start Triton Inference Server container
```
cd ${MORPHEUS_ROOT}/models

docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD:/models nvcr.io/nvidia/tritonserver:23.03-py3 tritonserver --model-repository=/models/triton-model-repo --model-control-mode=explicit --load-model sid-minibert-onnx --load-model abp-nvsmi-xgb --load-model phishing-bert-onnx
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models/triton-model-repo --model-control-mode=explicit --load-model sid-minibert-onnx --load-model abp-nvsmi-xgb --load-model phishing-bert-onnx
```

##### Verify Model Deployments
Expand Down