Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: Installer installs torch CUDA even when ROCm is selected #7146

Closed
1 task done
max-maag opened this issue Oct 18, 2024 · 0 comments · Fixed by #7147
Closed
1 task done

[bug]: Installer installs torch CUDA even when ROCm is selected #7146

max-maag opened this issue Oct 18, 2024 · 0 comments · Fixed by #7147
Labels
bug Something isn't working

Comments

@max-maag
Copy link
Contributor

max-maag commented Oct 18, 2024

Is there an existing issue for this problem?

  • I have searched the existing issues

Operating system

Linux

GPU vendor

AMD (ROCm)

GPU model

RX 6650 XT

GPU VRAM

8GB

Version number

5.1.1, 5.2.0

Browser

n/a

Python dependencies

No response

What happened

When launching an Invoke server that was installed with ROCm support, the CPU is selected as the torch device.

What you expected to happen

The Invoke server should use the dedicated GPU.

How to reproduce the problem

Additional context

I tried manually installing torch with ROCm support in a fresh venv. Invoke's installer script tries to install torch 2.4.1 with ROCm 5.6, so I those are the versions I tried to install:

>pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/rocm5.6
Looking in indexes: https://download.pytorch.org/whl/rocm5.6
ERROR: Could not find a version that satisfies the requirement torch==2.4.1 (from versions: 2.2.0+rocm5.6, 2.2.1+rocm5.6, 2.2.2+rocm5.6)
ERROR: No matching distribution found for torch==2.4.1

I then tried the most recent ROCm version, 6.2, with the same result except that it reports that only version 2.5.0+rocm6.2 is available.

Finally, I tried ROCm 6.1 which worked.

Fixing the installer by changing the URL in installer.py:410 to "https://download.pytorch.org/whl/rocm6.1" results in the server using the dedicated GPU by default. (even without manually setting the CUDA_VERSION and HSA_OVERRIDE_GFX_VERSION environment variables, which was necessary in the last Invoke version I used, 4.2.7post1) Setting the CUDA_VERSION and HSA_OVERRDIE_GFX_VERSION environment variable is still necessary. While the server will start and the log will mention that it is using the correct GPU, attempting to generate an image will fail with "RuntimeError: HIP error: invalid device function" if the variables are not set correctly.

As far as I can tell, bumping ROCm from 5.6 to 6.1 works without issues. The pytorch documentation for installing 2.4.1 also uses ROCm 6.1.

Finally, it might be worthwhile to think about why this issue happened and how to prevent it from happening again in the future. Unfortunately I don't have any good answers for that. Testing release candidates on all supported platforms would be ideal but also expensive.

Discord username

No response

@max-maag max-maag added the bug Something isn't working label Oct 18, 2024
max-maag added a commit to max-maag/InvokeAI that referenced this issue Oct 18, 2024
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer tries to install torch 2.4.1 with ROCm 5.6 support, which
 does not exist. As a result, the installation falls back to the default CUDA
 version so AMD GPUs aren't detected. This commits fixes that by bumping the
 ROCm version to 6.1.

Closes invoke-ai#7146
max-maag added a commit to max-maag/InvokeAI that referenced this issue Oct 18, 2024
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer tries to install torch 2.4.1 with ROCm 5.6 support, which
does not exist. As a result, the installation falls back to the default CUDA
version so AMD GPUs aren't detected. This commits fixes that by bumping the
ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes invoke-ai#7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
max-maag added a commit to max-maag/InvokeAI that referenced this issue Oct 19, 2024
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6
support, which does not exist. As a result, the installation falls back to the
default CUDA version so AMD GPUs aren't detected. This commits fixes that by
bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes invoke-ai#7006
Closes invoke-ai#7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
max-maag added a commit to max-maag/InvokeAI that referenced this issue Oct 19, 2024
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6
support, which does not exist. As a result, the installation falls back to the
default CUDA version so AMD GPUs aren't detected. This commits fixes that by
bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes invoke-ai#7006
Closes invoke-ai#7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
hipsterusername pushed a commit to max-maag/InvokeAI that referenced this issue Oct 20, 2024
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6
support, which does not exist. As a result, the installation falls back to the
default CUDA version so AMD GPUs aren't detected. This commits fixes that by
bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes invoke-ai#7006
Closes invoke-ai#7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
psychedelicious pushed a commit to max-maag/InvokeAI that referenced this issue Oct 22, 2024
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6
support, which does not exist. As a result, the installation falls back to the
default CUDA version so AMD GPUs aren't detected. This commits fixes that by
bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes invoke-ai#7006
Closes invoke-ai#7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant