-
Notifications
You must be signed in to change notification settings - Fork 27.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: gfx906 ROCM won't work with torch: 2.0.1+rocm5.4.2 but works with other AIs #10873
Comments
What do you mean for "Other AIs"? other UIs for StableDiffusion? or something different, like oobabooga's text-generation-webui? |
Anyway, i had issues too on my RX 5700XT with the webui and Pytorch2. as a workaround i kept it back for AMD cards, but then #10465 removed it. I made a PR for another workaround, wich i hope makes everyone happy. Menwhile, if you tell me wich AIs you were talking about i can investigate further and try to get a proper solution |
My radeon VII gfx_906 can work with even dev version for InvokAI tested with lastest dev version of pythourch Also I noticed this disappear after upgrade pythourch for Webui
It won't print this anymore when you generate your first my thought it won't recognize video card it fail to detect it.
And I also tried with other AI art generator won't get update for long time it also direly work with dev version of pytourch But Webui won't want to work with it even its stable version of webui.sh typed one. |
Ok, but you had just got the UI running or you actually got it to generate a image? |
Yes I can generate with it even dev lastest pytorch. And your gpu not in rocm support list but mine is in list. |
Also still get same thing on 1.4.0 dev 59419bd |
gfx906 is not the only one affected: gfx1031 (RDNA2) also suffers from this exact issue. Running on Fedora 38 with an Intel KBL-R system. EDIT: possibly relevant and related is #10296 |
AMD on their live stream invent pytorch hugface etc. but in application I can't see I typed it rocm offical and pythorch but noone from them not even reply. This is last time I making mistake to buy AMD never gona happend on my next releases unless AMD fix this mess but instead of fixing they killing support of my card for ROCM in next releases so yeah nvidia looks so awesome to my eyes. And I have been buying AMD since 2005. |
How can I get Radeon VII be recgonized as CUDA? invokeai --web have you come across this approach? to make sure your GPU is being detected Name: gfx1031 You'll note it says gfx1031 in mine - technically the 6700XT isn't usable with ROCm for some reason, but actually it is, so you run export HSA_OVERRIDE_GFX_VERSION=10.3.0 Lastly you want to add yourself to the render and video groups using sudo usermod -a -G render sudo apt-get install python3 nano ~/.bashrc alias python=python3 |
I would like to follow up on this with pytorch/pytorch#103973. TL;DR is you need PCIe atomics support to get ROCm to work, even if said otherwise for post-Vega hardware. eGPU setups (even with integrated controllers) do not seem to expose the feature, basically requiring a full-fledged desktop with the PCIe x16 slot that is connected to the CPU. It is still awkward how it all used to work with pytorch+ROCm 5.2, but AMD's documentation about atomics support has been pretty straightforward about it. |
As mentioned here pytorch/pytorch#106728, pytorch 2 works just fine if compiled on rocm 5.2, so i guess the problem here isn't about pytorch 1 vs 2, but it's about rocm 5.3 and newer breaking the support.
The pci atomics stuff is a good suggestion, but i don't think it's the case, at least for me. mi machine should be able to handle them. also, i tried to compile rocm using the new rocm 5.7 flag as described in the post you mentioned but it didn't seem to make any difference, while pytorch2 compiled on rocm5.2 is indeed working. i opened a new issue in rocm's repo ROCm/ROCm#2527 |
Well that's just great, PyTorch deleted their rocm5.2 repo. Edit: Oops, my bad, it's a 3.10 specific repo. |
Hmm, works fine with Python 3.11 and upstream PyTorch+rocm5.6 and TorchVision+rocm5.6, on gfx1031, if I specify the HSA GFX version override environment variable. Does not work with Arch's builds of pytorch or the AUR torchvision.
Not sure if there's a compatible override for 906 / 9.0.6. Maybe ask the ROCm repository? |
Okey heres after months of trying fix for missing pci atomics problem: |
I am on Radeon VII and after 5 months of ROCm team we finally fixed but after tons of problem and unsupport thing for my radeon VII. This is my last AMD card until AMD do somethings than make us suffer so badly to regret buying AMD not gona buy AMD. I have been using AMD since 2005 or something but Radeon VII make me give up. So much problems early cut support not even win support still for pytorch etc.... I hope your repo helps suffering AMD users. |
Is there an existing issue for this?
What happened?
Trying to make webui work with pytourch 2.0.1 + rocm 5.4.2 but It won't work.
Steps to reproduce the problem
What should have happened?
It should be generate like normal
Commit where the problem happens
b957dcf
What Python version are you running on ?
Python 3.10.x
What platforms do you use to access the UI ?
Linux
What device are you running WebUI on?
AMD GPUs (RX 6000 above), AMD GPUs (RX 5000 below)
What browsers do you use to access the UI ?
Mozilla Firefox
Command Line Arguments
Directly webui.sh give me that terminal errors. But if I run with this --no-half --disable-nan-check It render black.
List of extensions
No extra extensions directly from github
Console logs
Additional information
I got AMD Radeon™ VII ( GFX9 GPUs gfx906 Vega 20 ) and installed ROCM 5.5
And I only got problem with webui other AI works perfecly with
even other AI in same system I use for webui work direcly with this:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.4.2
Also I read this too. This is why I typing.
#10465
Please fix it I really want to get last pythourch and rocm version I Stuck with this
pip install torch==1.13.0+rocm5.2 torchvision==0.14.0+rocm5.2 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/rocm5.2
I begging for help at this point. Please help me. I gave my days to make it work but every time I try I fail.
The text was updated successfully, but these errors were encountered: