-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ml): introduce support of onnxruntime-rocm for AMD GPU #11063
base: main
Are you sure you want to change the base?
feat(ml): introduce support of onnxruntime-rocm for AMD GPU #11063
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs some polishing, but looks very promising!
machine-learning/Dockerfile
Outdated
# I ran into a compilation error when parallelizing the build | ||
# I used 12 threads to build onnxruntime, but it needs more than 16GB of RAM, and that's the amount of RAM I have on my machine | ||
# I lowered the number of threads to 8, and it worked | ||
# Even with 12 threads, the compilation took more than 1,5 hours to fail | ||
RUN ./build.sh --allow_running_as_root --config Release --build_wheel --update --build --parallel 9 --cmake_extra_defines\ | ||
ONNXRUNTIME_VERSION=1.18.1 --use_rocm --rocm_home=/opt/rocm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we run this on Mich @mertalev?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely yes.
d706380
to
8aca62b
Compare
I did some tests on the latest version. "job": {
"backgroundTask": {
"concurrency": 5
},
"smartSearch": {
"concurrency": 5
},
"metadataExtraction": {
"concurrency": 5
},
"faceDetection": {
"concurrency": 8
},
"search": {
"concurrency": 5
},
"sidecar": {
"concurrency": 5
},
"library": {
"concurrency": 5
},
"migration": {
"concurrency": 5
},
"thumbnailGeneration": {
"concurrency": 3
},
"videoConversion": {
"concurrency": 1
},
"notifications": {
"concurrency": 5
}
} Which crashes, because of memory allocation failure I think, see logs bellow. Error logs 'Failed to allocate memory'
But, with "job": {
"backgroundTask": {
"concurrency": 5
},
"smartSearch": {
- "concurrency": 5
+ "concurrency": 4
},
"metadataExtraction": {
"concurrency": 5
},
"faceDetection": {
- "concurrency": 8
+ "concurrency": 4
},
"search": {
"concurrency": 5
},
"sidecar": {
"concurrency": 5
},
"library": {
"concurrency": 5
},
"migration": {
"concurrency": 5
},
"thumbnailGeneration": {
"concurrency": 3
},
"videoConversion": {
"concurrency": 1
},
"notifications": {
"concurrency": 5
}
} It works well. For info, my server runs on 12 cores and 16Gb RAM, with an AMD Rx 6400 |
8aca62b
to
18f5d4d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job updating the documentation!
# Even with 12 threads, the compilation took more than 1,5 hours to fail | ||
RUN ./build.sh --allow_running_as_root --config Release --build_wheel --update --build --parallel 9 --cmake_extra_defines\ | ||
ONNXRUNTIME_VERSION=1.18.1 --use_rocm --rocm_home=/opt/rocm | ||
RUN mv /code/onnxruntime/build/Linux/Release/dist/*.whl /opt/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the different .whl files? We should only need one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only one, and it's name is known after compilation. I copied this from this Dockerfile, but if you prefer, I can replace with the full name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice for clarity that there's only one wheel involved here.
machine-learning/Dockerfile
Outdated
if [ "${DEVICE}" != "rocm" ]; then \ | ||
extra=libmimalloc2.0; \ | ||
fi && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you did this to not duplicate the apt command. You can change the apt command to be multi-line and make this if condition provide libmimalloc2.0 or nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but I'm not sure to understand what you mean. Could you provide a suggestion ?
[tool.poetry.group.rocm] | ||
optional = true | ||
|
||
[tool.poetry.group.rocm.dependencies] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking if we can put anything here.
We could list onnxruntime-rocm with a file path, but that's basically just for our build, not usable by others. It's very easy to do though and at least points to the relevant package.
Ideally, we could build onnxruntime in the base-images repo, have the wheel be a GH artifact and reference the URL for it here. This would be nice but not something you need to work on (unless you want to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any poetry knowledge at all, but from what I understood, we'd need to specify the dependency here (for instance onnxruntime), as a local file, but if we do so, we must provide the path to it.
But, the wheel is compiled during the build process, so the file can't exist on the host.
And if we update the poetry.lock
, the file must exist. Actually, I tested some stuff, but the only way I found was to declare an empty group, and provide the installation during the build.
This is a problem for someone who wants to develop on his host without Docker. But I think this is linked to the compilation of onnxruntime, not the was poetry is configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that's a fair point. We can leave this as-is for this PR and work on providing it through a link later.
machine-learning/rocm-PR19567.patch
Outdated
@@ -0,0 +1,176 @@ | |||
From a598a88db258f82a6e4bca75810921bd6bcee7e0 Mon Sep 17 00:00:00 2001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add a license for this code since it isn't ours. It can go in its own folder with a LICENSE file inside (ONNX Runtime's license). This should also end up in the final image when the wheel is copied (unless the wheel already has the license embedded in it, not sure if it does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a clue how this is done, can you help me here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that you can make a patches
folder that includes this patch and ONNX Runtime's [license](https://github.com/microsoft/onnxruntime/blob/7ec51f0a13d5d8cbd796de33276bf04210ce6176/LICENSE. For the wheel part, first use pip-licenses to see if the installed onnxruntime has a license in it. If it does, you don't need to do anything. If not, you would copy this LICENSE file in the Dockerfile so it's in the final image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't have the time to work on this before september, so if anyone wants to complete the PR, or reopen a new one, feel free.
# Even with 12 threads, the compilation took more than 1,5 hours to fail | ||
RUN ./build.sh --allow_running_as_root --config Release --build_wheel --update --build --parallel 9 --cmake_extra_defines\ | ||
ONNXRUNTIME_VERSION=1.18.1 --use_rocm --rocm_home=/opt/rocm | ||
RUN mv /code/onnxruntime/build/Linux/Release/dist/*.whl /opt/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only one, and it's name is known after compilation. I copied this from this Dockerfile, but if you prefer, I can replace with the full name.
machine-learning/Dockerfile
Outdated
if [ "${DEVICE}" != "rocm" ]; then \ | ||
extra=libmimalloc2.0; \ | ||
fi && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but I'm not sure to understand what you mean. Could you provide a suggestion ?
[tool.poetry.group.rocm] | ||
optional = true | ||
|
||
[tool.poetry.group.rocm.dependencies] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any poetry knowledge at all, but from what I understood, we'd need to specify the dependency here (for instance onnxruntime), as a local file, but if we do so, we must provide the path to it.
But, the wheel is compiled during the build process, so the file can't exist on the host.
And if we update the poetry.lock
, the file must exist. Actually, I tested some stuff, but the only way I found was to declare an empty group, and provide the installation during the build.
This is a problem for someone who wants to develop on his host without Docker. But I think this is linked to the compilation of onnxruntime, not the was poetry is configured.
machine-learning/rocm-PR19567.patch
Outdated
@@ -0,0 +1,176 @@ | |||
From a598a88db258f82a6e4bca75810921bd6bcee7e0 Mon Sep 17 00:00:00 2001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a clue how this is done, can you help me here ?
18f5d4d
to
840b1a5
Compare
machine-learning/Dockerfile
Outdated
if [ "${DEVICE}" != "rocm" ]; then \ | ||
extra=libmimalloc2.0; \ | ||
fi && \ | ||
apt-get install -y --no-install-recommends tini "${extra}" && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [ "${DEVICE}" != "rocm" ]; then \ | |
extra=libmimalloc2.0; \ | |
fi && \ | |
apt-get install -y --no-install-recommends tini "${extra}" && \ | |
apt-get install -y --no-install-recommends tini $(if [ "${DEVICE}" != "rocm" ]; then echo "libmimalloc2.0"; fi) && \ |
It may be necessary to do something similar to what Frigate did (before they removed this feature due to yolov8 licensing issues, nothing to do with AMD, just that the code was specific to running yolov8 ONNX models) by splitting up the containers per GPU chipset: |
I've converted this to a draft as it seems very much still a WIP. @Zelnes can you update if this is still something you're trying to pursue getting merged into Immich? |
Definitely want to see this merged. I think the remaining work can be broken down into:
|
Thanks, it makes sense indeed.
I don't have the time unfortunately, and I don't have the hardware to pursue the work and testing. Sorry if I didn't keep you updated before, I hope this can be merged one day. |
@mertalev given the above comment, is this something you would want to pursue yourself? If not I am not sure if it's worth keeping this open. |
Yes, except for (3). |
(If this means testing, I can help, it's just that I've been very busy recently. I think I am at the point where I'm getting more free time though.) |
Alright cool, will leave it with you then! 😄 |
I got it running and working correctly (note about something funky later) on my PC: Funky thing: https://discord.com/channels/979116623879368755/1291425089539018907/1307529638985203732 Edit: I theorize this is just funky AMD (because unstable drivers were normal with their 5000 series) and/or Linux behavior. |
840b1a5
to
46c505a
Compare
📖 Documentation deployed to pr-11063.preview.immich.app |
I'd like to propose this feature, which introduces support for machine learning on AMD GPU.
It's relying on an opened PR which disable some caching features, in order to be able to run in parallel. (IMHO parallelizing without cache is still faster than caching in single threaded mode).
Important note
I just tried to make something work for me, and I'm not pretending to propose something completely working for anyone anywhere.
I'm proposing this here, so advanced users/developers, can provide help, add some tests, and to make this available for others.
I hope I'll have some feedback 👍
Notes
Docker size
Second note: the downside of all this new docker AMD capable, is the 28GB size of final image. I hope someone can help reduce this size.
Links
Please, see this discussion where I exchanged with @mertalev on this, and I posted more explanations, which led me here.