Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ml): introduce support of onnxruntime-rocm for AMD GPU #11063

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 20 additions & 18 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,21 +48,21 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
suffix: ["", "-cuda", "-openvino", "-armnn"]
suffix: ['', '-cuda', '-rocm', '-openvino', '-armnn']
steps:
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Re-tag image
run: |
REGISTRY_NAME="ghcr.io"
REPOSITORY=${{ github.repository_owner }}/immich-machine-learning
TAG_OLD=main${{ matrix.suffix }}
TAG_NEW=${{ github.event.number == 0 && github.ref_name || format('pr-{0}', github.event.number) }}${{ matrix.suffix }}
docker buildx imagetools create -t $REGISTRY_NAME/$REPOSITORY:$TAG_NEW $REGISTRY_NAME/$REPOSITORY:$TAG_OLD
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Re-tag image
run: |
REGISTRY_NAME="ghcr.io"
REPOSITORY=${{ github.repository_owner }}/immich-machine-learning
TAG_OLD=main${{ matrix.suffix }}
TAG_NEW=${{ github.event.number == 0 && github.ref_name || format('pr-{0}', github.event.number) }}${{ matrix.suffix }}
docker buildx imagetools create -t $REGISTRY_NAME/$REPOSITORY:$TAG_NEW $REGISTRY_NAME/$REPOSITORY:$TAG_OLD

retag_server:
name: Re-Tag Server
Expand All @@ -71,7 +71,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
suffix: [""]
suffix: ['']
steps:
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
Expand All @@ -87,12 +87,11 @@ jobs:
TAG_NEW=${{ github.event.number == 0 && github.ref_name || format('pr-{0}', github.event.number) }}${{ matrix.suffix }}
docker buildx imagetools create -t $REGISTRY_NAME/$REPOSITORY:$TAG_NEW $REGISTRY_NAME/$REPOSITORY:$TAG_OLD


build_and_push_ml:
name: Build and Push ML
needs: pre-job
if: ${{ needs.pre-job.outputs.should_run_ml == 'true' }}
runs-on: ubuntu-latest
runs-on: mich
env:
image: immich-machine-learning
context: machine-learning
Expand All @@ -109,6 +108,10 @@ jobs:
device: cuda
suffix: -cuda

- platforms: linux/amd64
device: rocm
suffix: -rocm

- platforms: linux/amd64
device: openvino
suffix: -openvino
Expand Down Expand Up @@ -192,7 +195,6 @@ jobs:
BUILD_SOURCE_REF=${{ github.ref_name }}
BUILD_SOURCE_COMMIT=${{ github.sha }}


build_and_push_server:
name: Build and Push Server
runs-on: ubuntu-latest
Expand Down
4 changes: 2 additions & 2 deletions docker/docker-compose.dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,12 @@ services:
image: immich-machine-learning-dev:latest
# extends:
# file: hwaccel.ml.yml
# service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference
# service: cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference
build:
context: ../machine-learning
dockerfile: Dockerfile
args:
- DEVICE=cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference
- DEVICE=cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference
ports:
- 3003:3003
volumes:
Expand Down
4 changes: 2 additions & 2 deletions docker/docker-compose.prod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ services:
image: immich-machine-learning:latest
# extends:
# file: hwaccel.ml.yml
# service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference
# service: cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference
build:
context: ../machine-learning
dockerfile: Dockerfile
args:
- DEVICE=cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference
- DEVICE=cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference
ports:
- 3003:3003
volumes:
Expand Down
4 changes: 2 additions & 2 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,12 @@ services:

immich-machine-learning:
container_name: immich_machine_learning
# For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
# For hardware acceleration, add one of -[armnn, cuda, rocm, openvino] to the image tag.
# Example tag: ${IMMICH_VERSION:-release}-cuda
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
# extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
# file: hwaccel.ml.yml
# service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
# service: cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
volumes:
- model-cache:/cache
env_file:
Expand Down
7 changes: 7 additions & 0 deletions docker/hwaccel.ml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,13 @@ services:
capabilities:
- gpu

rocm:
group_add:
- video
devices:
- /dev/dri:/dev/dri
- /dev/kfd:/dev/kfd

openvino:
device_cgroup_rules:
- 'c 189:* rmw'
Expand Down
9 changes: 7 additions & 2 deletions docs/docs/features/ml-hardware-acceleration.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ You do not need to redo any machine learning jobs after enabling hardware accele

- ARM NN (Mali)
- CUDA (NVIDIA GPUs with [compute capability](https://developer.nvidia.com/cuda-gpus) 5.2 or higher)
- ROCm (AMD GPUs)
- OpenVINO (Intel discrete GPUs such as Iris Xe and Arc)

## Limitations
Expand Down Expand Up @@ -41,6 +42,10 @@ You do not need to redo any machine learning jobs after enabling hardware accele
- The installed driver must be >= 535 (it must support CUDA 12.2).
- On Linux (except for WSL2), you also need to have [NVIDIA Container Toolkit][nvct] installed.

#### ROCm

- The GPU must be supported by ROCm. If it isn't officially supported, you can attempt to use the `HSA_OVERRIDE_GFX_VERSION` environmental variable: `HSA_OVERRIDE_GFX_VERSION=<a supported version, e.g. 10.3.0>`.

#### OpenVINO

- The server must have a discrete GPU, i.e. Iris Xe or Arc. Expect issues when attempting to use integrated graphics.
Expand All @@ -50,12 +55,12 @@ You do not need to redo any machine learning jobs after enabling hardware accele

1. If you do not already have it, download the latest [`hwaccel.ml.yml`][hw-file] file and ensure it's in the same folder as the `docker-compose.yml`.
2. In the `docker-compose.yml` under `immich-machine-learning`, uncomment the `extends` section and change `cpu` to the appropriate backend.
3. Still in `immich-machine-learning`, add one of -[armnn, cuda, openvino] to the `image` section's tag at the end of the line.
3. Still in `immich-machine-learning`, add one of -[armnn, cuda, rocm, openvino] to the `image` section's tag at the end of the line.
4. Redeploy the `immich-machine-learning` container with these updated settings.

### Confirming Device Usage

You can confirm the device is being recognized and used by checking its utilization. There are many tools to display this, such as `nvtop` for NVIDIA or Intel and `intel_gpu_top` for Intel.
You can confirm the device is being recognized and used by checking its utilization. There are many tools to display this, such as `nvtop` for NVIDIA or Intel, `intel_gpu_top` for Intel, and `radeontop` for AMD.

You can also check the logs of the `immich-machine-learning` container. When a Smart Search or Face Detection job begins, or when you search with text in Immich, you should either see a log for `Available ORT providers` containing the relevant provider (e.g. `CUDAExecutionProvider` in the case of CUDA), or a `Loaded ANN model` log entry without errors in the case of ARM NN.

Expand Down
4 changes: 2 additions & 2 deletions docs/docs/guides/remote-machine-learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,12 @@ name: immich_remote_ml
services:
immich-machine-learning:
container_name: immich_machine_learning
# For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
# For hardware acceleration, add one of -[armnn, cuda, rocm, openvino] to the image tag.
# Example tag: ${IMMICH_VERSION:-release}-cuda
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
# extends:
# file: hwaccel.ml.yml
# service: # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
# service: # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
volumes:
- model-cache:/cache
restart: always
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
From c4e6f81b4901b2b8178b964377e8fde7118f0e2e Mon Sep 17 00:00:00 2001
From: mertalev <101130780+mertalev@users.noreply.github.com>
Date: Fri, 20 Dec 2024 00:59:21 -0500
Subject: [PATCH] fix: avoid race condition for rocm conv algo caching

---
onnxruntime/core/providers/rocm/nn/conv.cc | 2 ++
1 file changed, 2 insertions(+)

diff --git a/onnxruntime/core/providers/rocm/nn/conv.cc b/onnxruntime/core/providers/rocm/nn/conv.cc
index d7f47d07a8..134f8a3b43 100644
--- a/onnxruntime/core/providers/rocm/nn/conv.cc
+++ b/onnxruntime/core/providers/rocm/nn/conv.cc
@@ -122,6 +122,8 @@ Status Conv<T, NHWC>::UpdateState(OpKernelContext* context, bool bias_expected)
bool input_dims_changed = (s_.last_x_dims != x_dims);
bool w_dims_changed = (s_.last_w_dims != w_dims);
if (input_dims_changed || w_dims_changed) {
+ // lock is needed to avoid race condition during algo search
+ std::lock_guard<OrtMutex> lock(s_.mutex);
if (input_dims_changed)
s_.last_x_dims = gsl::make_span(x_dims);

--
2.43.0

47 changes: 42 additions & 5 deletions machine-learning/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,36 @@ RUN mkdir /opt/armnn && \
cd /opt/ann && \
sh build.sh

# Warning: 26.3Gb of disk space required to pull this image
# https://github.com/microsoft/onnxruntime/blob/main/dockerfiles/Dockerfile.rocm
FROM rocm/dev-ubuntu-24.04:6.2.4-complete AS builder-rocm

WORKDIR /code

RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.12-venv
# Install same version as the Dockerfile provided by onnxruntime
RUN wget -nv https://github.com/Kitware/CMake/releases/download/v3.27.3/cmake-3.27.3-linux-x86_64.sh && \
chmod +x cmake-3.27.3-linux-x86_64.sh && \
mkdir -p /code/cmake-3.27.3-linux-x86_64 && \
./cmake-3.27.3-linux-x86_64.sh --skip-license --prefix=/code/cmake-3.27.3-linux-x86_64 && \
rm cmake-3.27.3-linux-x86_64.sh

ENV PATH /code/cmake-3.27.3-linux-x86_64/bin:${PATH}

# Prepare onnxruntime repository & build onnxruntime
# Note: cannot upgrade from 1.19.2 as of writing until upstream updates the ROCm CI
RUN git clone --single-branch --branch v1.19.2 --recursive "https://github.com/Microsoft/onnxruntime" onnxruntime
WORKDIR /code/onnxruntime
# Fix for multi-threading based on comments in https://github.com/microsoft/onnxruntime/pull/19567
COPY ./0001-fix-avoid-race-condition-for-rocm-conv-algo-caching.patch /tmp/
RUN git apply /tmp/0001-fix-avoid-race-condition-for-rocm-conv-algo-caching.patch

RUN /bin/sh ./dockerfiles/scripts/install_common_deps.sh
# Note: the `parallel` setting uses a substantial amount of RAM
RUN ./build.sh --allow_running_as_root --config Release --build_wheel --update --build --parallel 13 --cmake_extra_defines\
ONNXRUNTIME_VERSION=1.19.2 --use_rocm --rocm_home=/opt/rocm
RUN mv /code/onnxruntime/build/Linux/Release/dist/*.whl /opt/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the different .whl files? We should only need one.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only one, and it's name is known after compilation. I copied this from this Dockerfile, but if you prefer, I can replace with the full name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice for clarity that there's only one wheel involved here.


FROM builder-${DEVICE} AS builder

ARG DEVICE
Expand All @@ -32,6 +62,9 @@ RUN poetry config installer.max-workers 10 && \
RUN python3 -m venv /opt/venv

COPY poetry.lock pyproject.toml ./
RUN if [ "$DEVICE" = "rocm" ]; then \
poetry add /opt/onnxruntime_rocm-*.whl; \
fi
RUN poetry install --sync --no-interaction --no-ansi --no-root --with ${DEVICE} --without dev

FROM python:3.11-slim-bookworm@sha256:370c586a6ffc8c619e6d652f81c094b34b14b8f2fb9251f092de23f16e299b78 AS prod-cpu
Expand All @@ -40,10 +73,10 @@ FROM prod-cpu AS prod-openvino

RUN apt-get update && \
apt-get install --no-install-recommends -yqq ocl-icd-libopencl1 wget && \
wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17384.11/intel-igc-core_1.0.17384.11_amd64.deb && \
wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17384.11/intel-igc-opencl_1.0.17384.11_amd64.deb && \
wget https://github.com/intel/compute-runtime/releases/download/24.31.30508.7/intel-opencl-icd_24.31.30508.7_amd64.deb && \
wget https://github.com/intel/compute-runtime/releases/download/24.31.30508.7/libigdgmm12_22.4.1_amd64.deb && \
wget -nv https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17384.11/intel-igc-core_1.0.17384.11_amd64.deb && \
wget -nv https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17384.11/intel-igc-opencl_1.0.17384.11_amd64.deb && \
wget -nv https://github.com/intel/compute-runtime/releases/download/24.31.30508.7/intel-opencl-icd_24.31.30508.7_amd64.deb && \
wget -nv https://github.com/intel/compute-runtime/releases/download/24.31.30508.7/libigdgmm12_22.4.1_amd64.deb && \
dpkg -i *.deb && \
rm *.deb && \
apt-get remove wget -yqq && \
Expand Down Expand Up @@ -80,11 +113,15 @@ COPY --from=builder-armnn \
/opt/ann/build.sh \
/opt/armnn/

FROM rocm/dev-ubuntu-24.04:6.2.4-complete AS prod-rocm


FROM prod-${DEVICE} AS prod

ARG DEVICE
mertalev marked this conversation as resolved.
Show resolved Hide resolved

RUN apt-get update && \
apt-get install -y --no-install-recommends tini $(if ! [ "$DEVICE" = "openvino" ]; then echo "libmimalloc2.0"; fi) && \
apt-get install -y --no-install-recommends tini $(if ! [ "$DEVICE" = "openvino" ] && ! [ "$DEVICE" = "rocm" ]; then echo "libmimalloc2.0"; fi) && \
apt-get autoremove -yqq && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
Expand Down
4 changes: 2 additions & 2 deletions machine-learning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

This project uses [Poetry](https://python-poetry.org/docs/#installation), so be sure to install it first.
Running `poetry install --no-root --with dev --with cpu` will install everything you need in an isolated virtual environment.
CUDA and OpenVINO are supported as acceleration APIs. To use them, you can replace `--with cpu` with either of `--with cuda` or `--with openvino`. In the case of CUDA, a [compute capability](https://developer.nvidia.com/cuda-gpus) of 5.2 or higher is required.
CUDA, ROCM and OpenVINO are supported as acceleration APIs. To use them, you can replace `--with cpu` with either of `--with cuda`, `--with rocm` or `--with openvino`. In the case of CUDA, a [compute capability](https://developer.nvidia.com/cuda-gpus) of 5.2 or higher is required.

To add or remove dependencies, you can use the commands `poetry add $PACKAGE_NAME` and `poetry remove $PACKAGE_NAME`, respectively.
Be sure to commit the `poetry.lock` and `pyproject.toml` files with `poetry lock --no-update` to reflect any changes in dependencies.
Expand Down Expand Up @@ -37,4 +37,4 @@ This project utilizes facial recognition models from the [InsightFace](https://g
## License and Use Restrictions
We have received permission to use the InsightFace facial recognition models in our project, as granted via email by Jia Guo (guojia@insightface.ai) on 18th March 2023. However, it's important to note that this permission does not extend to the redistribution or commercial use of their models by third parties. Users and developers interested in using these models should review the licensing terms provided in the InsightFace GitHub repository.

For more information on the capabilities of the InsightFace models and to ensure compliance with their license, please refer to their [official repository](https://github.com/deepinsight/insightface). Adhering to the specified licensing terms is crucial for the respectful and lawful use of their work.
For more information on the capabilities of the InsightFace models and to ensure compliance with their license, please refer to their [official repository](https://github.com/deepinsight/insightface). Adhering to the specified licensing terms is crucial for the respectful and lawful use of their work.
7 changes: 6 additions & 1 deletion machine-learning/app/models/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,12 @@
}


SUPPORTED_PROVIDERS = ["CUDAExecutionProvider", "OpenVINOExecutionProvider", "CPUExecutionProvider"]
SUPPORTED_PROVIDERS = [
"CUDAExecutionProvider",
"ROCMExecutionProvider",
"OpenVINOExecutionProvider",
"CPUExecutionProvider",
]


def get_model_source(model_name: str) -> ModelSource | None:
Expand Down
2 changes: 1 addition & 1 deletion machine-learning/app/sessions/ort.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ def _provider_options_default(self) -> list[dict[str, Any]]:
match provider:
case "CPUExecutionProvider":
options = {"arena_extend_strategy": "kSameAsRequested"}
case "CUDAExecutionProvider":
case "CUDAExecutionProvider" | "ROCMExecutionProvider":
options = {"arena_extend_strategy": "kSameAsRequested", "device_id": settings.device_id}
case "OpenVINOExecutionProvider":
options = {
Expand Down
Loading
Loading