Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AMD GPUs not being detected #7147

Merged

Conversation

max-maag
Copy link
Contributor

@max-maag max-maag commented Oct 18, 2024

Summary

Each version of torch is only available for specific versions of CUDA and ROCm. The Invoke installer tries to install torch 2.4.1 with ROCm 5.6 support, which does not exist. As a result, the installation falls back to the default CUDA version so AMD GPUs aren't detected. This commits fixes that by bumping the ROCm version to 6.1, as suggested by the PyTorch documentation.1 Torch 2.4.1 does not appear to be available for ROCm 6.2.

The specified CUDA version of 12.4 is still correct according to 1 so it does need to be changed.

Related Issues / Discussions

Closes #7006
Closes #7146

QA Instructions

  • Install Invoke 5.1.1 or later with ROCm support using the installer on a system with an AMD GPU.
  • Start the server.
  • Generate any image.

Without this fix, the CPU is used to generate images. This can be seen in the log output. Image generation also takes forever.

I did not test the changes to the Dockerfile since I am not familiar with Docker.

Merge Plan

n/a

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)

Footnotes

  1. https://pytorch.org/get-started/previous-versions/#v241 2

@github-actions github-actions bot added the installer PRs that change the installer label Oct 18, 2024
@ebr
Copy link
Member

ebr commented Oct 18, 2024

We don't have any way of testing this on a 6xxx architecture, but this seems like a worthwhile and correct change nonetheless. We do know that 7xxx (Navi32/33 / RDNA3) chips will need some more special handling, but that shouldn't block this update.

@max-maag - we also have this index url set in the Dockerfile - could you please update that in this PR?
Thanks for your contribution!

@max-maag
Copy link
Contributor Author

max-maag commented Oct 19, 2024

we also have this index url set in the Dockerfile - could you please update that in this PR? Thanks for your contribution!

I saw that someone reported that in #7006 but because I'm not using Docker and am not familiar with it at all I kept it out of this PR until now. I changed the URL in the Docker file and added the relevant issue to this PR's related issue list.

I didn't test the Dockerfile change though. I don't see any reason why it shouldn't work but maybe should verify the fix just to be sure.

@max-maag max-maag force-pushed the fix/incompatible-torch-rocm-versions branch from e3a7e5f to d8b0730 Compare October 19, 2024 00:33
@ebr
Copy link
Member

ebr commented Oct 20, 2024

Approved - LGTM from my perspective. Thanks again for the contribution.

Just FYI - I finally got it to generate on a recent AMD GPU (W7900). Here's a full write-up: https://gist.github.com/ebr/e4e4118b603bd95bfd2408ee30c27f0a. It's not pretty, but it works.

@hipsterusername hipsterusername force-pushed the fix/incompatible-torch-rocm-versions branch from d8b0730 to b41762c Compare October 20, 2024 13:34
@psychedelicious psychedelicious enabled auto-merge (rebase) October 22, 2024 22:58
Each version of torch is only available for specific versions of CUDA and ROCm.
The Invoke installer and dockerfile try to install torch 2.4.1 with ROCm 5.6
support, which does not exist. As a result, the installation falls back to the
default CUDA version so AMD GPUs aren't detected. This commits fixes that by
bumping the ROCm version to 6.1, as suggested by the PyTorch documentation. [1]

The specified CUDA version of 12.4 is still correct according to [1] so it does
need to be changed.

Closes invoke-ai#7006
Closes invoke-ai#7146

[1]: https://pytorch.org/get-started/previous-versions/#v241
@psychedelicious psychedelicious force-pushed the fix/incompatible-torch-rocm-versions branch from b41762c to dd3f044 Compare October 22, 2024 22:58
@psychedelicious psychedelicious merged commit d85733f into invoke-ai:main Oct 22, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker installer PRs that change the installer
Projects
None yet
3 participants