[bug]: invokeai-rocm container doesn't support gpus #7006

nmcbride · 2024-10-02T06:07:09Z

Is there an existing issue for this problem?

I have searched the existing issues

Operating system

Linux

GPU vendor

AMD (ROCm)

GPU model

RX 7900 XTX, RX 7700S

GPU VRAM

26GB, 8GB

Version number

invokeai-rocm

Browser

firefox

Python dependencies

No response

What happened

I am trying to use the container version like this:

--device /dev/kfd --device /dev/dri --volume ./:/invokeai -p 9090:9090 --name invokeai ghcr.io/invoke-ai/invokeai:main-rocm

However it doesn't seem to detect either of my amd GPUs and falls back to CPU. It also says bitsandbytes doesn't have GPU support.

The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[2024-10-02 05:54:19,891]::[InvokeAI]::INFO --> Patchmatch initialized
[2024-10-02 05:54:20,552]::[InvokeAI]::INFO --> Using torch device: CPU

ollama works fine with rocm I am not sure why this doesn't or how I can get it working?

What you expected to happen

I expect the container to start utilizing rocm and detecting the gpu

How to reproduce the problem

No response

Additional context

No response

Discord username

No response

The text was updated successfully, but these errors were encountered:

SadmL · 2024-10-09T15:53:27Z

Bare-metal affected too
Using Installer with rocm option - RX 6700 XT GPU
Works fine on < 5.0.2, but starting > 5.1 getting error
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
Clearing pip cache isn't helping

Made systemd service

[Unit]
Description=InvokeAI

[Service]
ExecStart=/home/user/.local/invokeai/.venv/bin/invokeai-web
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Environment="PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.95,max_split_size_mb:512"
Environment="INVOKEAI_ROOT=/home/user/.local/invokeai"
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=default.target

slartibartfast11 · 2024-10-16T00:53:55Z

Same result on bare metal.

5.1.1 doesn't use/detect the ROCm device. An in-place install of 5.0.2 restores AMD support.

apoordev · 2024-10-16T16:39:04Z

None of [version]-rocm containers work for me, even before version 5.0.2. Im using podman with the proper arguments (I know podman is not directly supported, but I have ollama running via this same configuration and I also run the bare metal installer via a rootless distrobox container and that has worked fine).
Here are my arguments.

Image=ghcr.io/invoke-ai/invokeai:main-rocm
ContainerName=invokeai
AutoUpdate=registry
Environment=INVOKEAI_ROOT=/var/lib/invokeai
PublishPort=9091:9090
Volume=/var/home/user/.local/share/invokeai:/var/lib/invokeai
SecurityLabelDisable=true
AddDevice=/dev/dri
AddDevice=/dev/kfd

Using the 5.1.1 bare metal installer also defaults to using the CPU. But using the 5.0.2 bare metal installer (again under a rootless distrobox container) detects my AMD GPU and works as intended.

max-maag · 2024-10-18T13:17:25Z

This is caused by an incorrect ROCm version, see #7146. I'm not familiar with docker but I assume changing the URL in line 41 of the dockerfile to "https://download.pytorch.org/whl/rocm6.1" should fix the issue.

Each version of torch is only available for specific versions of CUDA and ROCm. The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6 support, which does not exist. As a result, the installation falls back to the default CUDA version so AMD GPUs aren't detected. This commits fixes that by bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1] The specified CUDA version of 12.4 is still correct according to [1] so it does need to be changed. Closes invoke-ai#7006 Closes invoke-ai#7146 [1]: https://pytorch.org/get-started/previous-versions/#v241

Each version of torch is only available for specific versions of CUDA and ROCm. The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6 support, which does not exist. As a result, the installation falls back to the default CUDA version so AMD GPUs aren't detected. This commits fixes that by bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1] The specified CUDA version of 12.4 is still correct according to [1] so it does need to be changed. Closes #7006 Closes #7146 [1]: https://pytorch.org/get-started/previous-versions/#v241

nmcbride added the bug Something isn't working label Oct 2, 2024

nmcbride changed the title ~~[bug]: invoke-rocm container doesn't support gpus~~ [bug]: invokeai-rocm container doesn't support gpus Oct 2, 2024

ebr self-assigned this Oct 2, 2024

max-maag mentioned this issue Oct 19, 2024

Fix AMD GPUs not being detected #7147

Merged

3 tasks

psychedelicious closed this as completed in #7147 Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug]: invokeai-rocm container doesn't support gpus #7006

[bug]: invokeai-rocm container doesn't support gpus #7006

nmcbride commented Oct 2, 2024

SadmL commented Oct 9, 2024 •

edited

Loading

slartibartfast11 commented Oct 16, 2024

apoordev commented Oct 16, 2024

max-maag commented Oct 18, 2024

[bug]: invokeai-rocm container doesn't support gpus #7006

[bug]: invokeai-rocm container doesn't support gpus #7006

Comments

nmcbride commented Oct 2, 2024

Is there an existing issue for this problem?

Operating system

GPU vendor

GPU model

GPU VRAM

Version number

Browser

Python dependencies

What happened

What you expected to happen

How to reproduce the problem

Additional context

Discord username

SadmL commented Oct 9, 2024 • edited Loading

slartibartfast11 commented Oct 16, 2024

apoordev commented Oct 16, 2024

max-maag commented Oct 18, 2024

SadmL commented Oct 9, 2024 •

edited

Loading