-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug]: Installer installs torch CUDA even when ROCm is selected #7146
Labels
bug
Something isn't working
Comments
1 task
max-maag
added a commit
to max-maag/InvokeAI
that referenced
this issue
Oct 18, 2024
Each version of torch is only available for specific versions of CUDA and ROCm. The Invoke installer tries to install torch 2.4.1 with ROCm 5.6 support, which does not exist. As a result, the installation falls back to the default CUDA version so AMD GPUs aren't detected. This commits fixes that by bumping the ROCm version to 6.1. Closes invoke-ai#7146
max-maag
added a commit
to max-maag/InvokeAI
that referenced
this issue
Oct 18, 2024
Each version of torch is only available for specific versions of CUDA and ROCm. The Invoke installer tries to install torch 2.4.1 with ROCm 5.6 support, which does not exist. As a result, the installation falls back to the default CUDA version so AMD GPUs aren't detected. This commits fixes that by bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1] The specified CUDA version of 12.4 is still correct according to [1] so it does need to be changed. Closes invoke-ai#7146 [1]: https://pytorch.org/get-started/previous-versions/#v241
max-maag
added a commit
to max-maag/InvokeAI
that referenced
this issue
Oct 19, 2024
Each version of torch is only available for specific versions of CUDA and ROCm. The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6 support, which does not exist. As a result, the installation falls back to the default CUDA version so AMD GPUs aren't detected. This commits fixes that by bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1] The specified CUDA version of 12.4 is still correct according to [1] so it does need to be changed. Closes invoke-ai#7006 Closes invoke-ai#7146 [1]: https://pytorch.org/get-started/previous-versions/#v241
max-maag
added a commit
to max-maag/InvokeAI
that referenced
this issue
Oct 19, 2024
Each version of torch is only available for specific versions of CUDA and ROCm. The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6 support, which does not exist. As a result, the installation falls back to the default CUDA version so AMD GPUs aren't detected. This commits fixes that by bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1] The specified CUDA version of 12.4 is still correct according to [1] so it does need to be changed. Closes invoke-ai#7006 Closes invoke-ai#7146 [1]: https://pytorch.org/get-started/previous-versions/#v241
hipsterusername
pushed a commit
to max-maag/InvokeAI
that referenced
this issue
Oct 20, 2024
Each version of torch is only available for specific versions of CUDA and ROCm. The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6 support, which does not exist. As a result, the installation falls back to the default CUDA version so AMD GPUs aren't detected. This commits fixes that by bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1] The specified CUDA version of 12.4 is still correct according to [1] so it does need to be changed. Closes invoke-ai#7006 Closes invoke-ai#7146 [1]: https://pytorch.org/get-started/previous-versions/#v241
psychedelicious
pushed a commit
to max-maag/InvokeAI
that referenced
this issue
Oct 22, 2024
Each version of torch is only available for specific versions of CUDA and ROCm. The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6 support, which does not exist. As a result, the installation falls back to the default CUDA version so AMD GPUs aren't detected. This commits fixes that by bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1] The specified CUDA version of 12.4 is still correct according to [1] so it does need to be changed. Closes invoke-ai#7006 Closes invoke-ai#7146 [1]: https://pytorch.org/get-started/previous-versions/#v241
1 task
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is there an existing issue for this problem?
Operating system
Linux
GPU vendor
AMD (ROCm)
GPU model
RX 6650 XT
GPU VRAM
8GB
Version number
5.1.1, 5.2.0
Browser
n/a
Python dependencies
No response
What happened
When launching an Invoke server that was installed with ROCm support, the CPU is selected as the torch device.
What you expected to happen
The Invoke server should use the dedicated GPU.
How to reproduce the problem
invoke.sh
Additional context
I tried manually installing torch with ROCm support in a fresh venv. Invoke's installer script tries to install torch 2.4.1 with ROCm 5.6, so I those are the versions I tried to install:
I then tried the most recent ROCm version, 6.2, with the same result except that it reports that only version 2.5.0+rocm6.2 is available.
Finally, I tried ROCm 6.1 which worked.
Fixing the installer by changing the URL in installer.py:410 to
"https://download.pytorch.org/whl/rocm6.1"
results in the server using the dedicated GPU by default.(even without manually setting theSetting theCUDA_VERSION
andHSA_OVERRIDE_GFX_VERSION
environment variables, which was necessary in the last Invoke version I used, 4.2.7post1)CUDA_VERSION
andHSA_OVERRDIE_GFX_VERSION
environment variable is still necessary. While the server will start and the log will mention that it is using the correct GPU, attempting to generate an image will fail with "RuntimeError: HIP error: invalid device function" if the variables are not set correctly.As far as I can tell, bumping ROCm from 5.6 to 6.1 works without issues. The pytorch documentation for installing 2.4.1 also uses ROCm 6.1.
Finally, it might be worthwhile to think about why this issue happened and how to prevent it from happening again in the future. Unfortunately I don't have any good answers for that. Testing release candidates on all supported platforms would be ideal but also expensive.
Discord username
No response
The text was updated successfully, but these errors were encountered: