Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: invokeai-rocm container doesn't support gpus #7006

Closed
1 task done
nmcbride opened this issue Oct 2, 2024 · 4 comments · Fixed by #7147
Closed
1 task done

[bug]: invokeai-rocm container doesn't support gpus #7006

nmcbride opened this issue Oct 2, 2024 · 4 comments · Fixed by #7147
Assignees
Labels
bug Something isn't working

Comments

@nmcbride
Copy link

nmcbride commented Oct 2, 2024

Is there an existing issue for this problem?

  • I have searched the existing issues

Operating system

Linux

GPU vendor

AMD (ROCm)

GPU model

RX 7900 XTX, RX 7700S

GPU VRAM

26GB, 8GB

Version number

invokeai-rocm

Browser

firefox

Python dependencies

No response

What happened

I am trying to use the container version like this:

--device /dev/kfd --device /dev/dri --volume ./:/invokeai -p 9090:9090 --name invokeai ghcr.io/invoke-ai/invokeai:main-rocm

However it doesn't seem to detect either of my amd GPUs and falls back to CPU. It also says bitsandbytes doesn't have GPU support.

The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[2024-10-02 05:54:19,891]::[InvokeAI]::INFO --> Patchmatch initialized
[2024-10-02 05:54:20,552]::[InvokeAI]::INFO --> Using torch device: CPU

ollama works fine with rocm I am not sure why this doesn't or how I can get it working?

What you expected to happen

I expect the container to start utilizing rocm and detecting the gpu

How to reproduce the problem

No response

Additional context

No response

Discord username

No response

@nmcbride nmcbride added the bug Something isn't working label Oct 2, 2024
@nmcbride nmcbride changed the title [bug]: invoke-rocm container doesn't support gpus [bug]: invokeai-rocm container doesn't support gpus Oct 2, 2024
@ebr ebr self-assigned this Oct 2, 2024
@SadmL
Copy link

SadmL commented Oct 9, 2024

Bare-metal affected too
Using Installer with rocm option - RX 6700 XT GPU
Works fine on < 5.0.2, but starting > 5.1 getting error
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
Clearing pip cache isn't helping

Made systemd service

[Unit]
Description=InvokeAI

[Service]
ExecStart=/home/user/.local/invokeai/.venv/bin/invokeai-web
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Environment="PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.95,max_split_size_mb:512"
Environment="INVOKEAI_ROOT=/home/user/.local/invokeai"
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=default.target

@slartibartfast11
Copy link

Same result on bare metal.

5.1.1 doesn't use/detect the ROCm device. An in-place install of 5.0.2 restores AMD support.

@apoordev
Copy link

None of [version]-rocm containers work for me, even before version 5.0.2. Im using podman with the proper arguments (I know podman is not directly supported, but I have ollama running via this same configuration and I also run the bare metal installer via a rootless distrobox container and that has worked fine).
Here are my arguments.

Image=ghcr.io/invoke-ai/invokeai:main-rocm
ContainerName=invokeai
AutoUpdate=registry
Environment=INVOKEAI_ROOT=/var/lib/invokeai
PublishPort=9091:9090
Volume=/var/home/user/.local/share/invokeai:/var/lib/invokeai
SecurityLabelDisable=true
AddDevice=/dev/dri
AddDevice=/dev/kfd

Using the 5.1.1 bare metal installer also defaults to using the CPU. But using the 5.0.2 bare metal installer (again under a rootless distrobox container) detects my AMD GPU and works as intended.

@max-maag
Copy link
Contributor

This is caused by an incorrect ROCm version, see #7146. I'm not familiar with docker but I assume changing the URL in line 41 of the dockerfile to "https://download.pytorch.org/whl/rocm6.1" should fix the issue.

max-maag added a commit to max-maag/InvokeAI that referenced this issue Oct 19, 2024
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6
support, which does not exist. As a result, the installation falls back to the
default CUDA version so AMD GPUs aren't detected. This commits fixes that by
bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes invoke-ai#7006
Closes invoke-ai#7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
max-maag added a commit to max-maag/InvokeAI that referenced this issue Oct 19, 2024
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6
support, which does not exist. As a result, the installation falls back to the
default CUDA version so AMD GPUs aren't detected. This commits fixes that by
bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes invoke-ai#7006
Closes invoke-ai#7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
hipsterusername pushed a commit to max-maag/InvokeAI that referenced this issue Oct 20, 2024
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6
support, which does not exist. As a result, the installation falls back to the
default CUDA version so AMD GPUs aren't detected. This commits fixes that by
bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes invoke-ai#7006
Closes invoke-ai#7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
psychedelicious pushed a commit to max-maag/InvokeAI that referenced this issue Oct 22, 2024
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6
support, which does not exist. As a result, the installation falls back to the
default CUDA version so AMD GPUs aren't detected. This commits fixes that by
bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes invoke-ai#7006
Closes invoke-ai#7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
psychedelicious pushed a commit that referenced this issue Oct 22, 2024
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6
support, which does not exist. As a result, the installation falls back to the
default CUDA version so AMD GPUs aren't detected. This commits fixes that by
bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes #7006
Closes #7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants