You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when using e.g. --cuda-device 1 I want ComfyUI to use device with ID 1
Actual Behavior
it doesn't matter what I set in --cuda-device, ComfyUI will use device with ID 0 in any case
Steps to Reproduce
start ComfyUI with --cuda-device <any ID other than 0> on an AMD/HIP system
load any workflow
queue/start inference or whatever else you do with comfy on your GPU
see in system/OS monitoring tools that GPU with ID 0 is used no matter what was set in step 1
Debug Logs
it's not visible from the logs because there is no error created per se because the log output assumes the operation is complete by setting the environment variable CUDA_VISIBLE_DEVICES which was successful. however I can reproduce the behavior in plain python/torch. See next field for PoC and explanation
Other
For reference I have two AMD RX 7900 XT in my system and when using --cuda-device 1 it's supposed to only expose the 2nd GPU to torch/cuda but since the switch is not working due to the environment variable being ignored by pytorch+rocm it will use the default cuda device which is cuda0 i.e. my 1st GPU not the 2nd. So below is an example showcasing that the correct environment variable yields the expected result.
example setting CUDA_VISIBLE_DEVICES to 1 as implemented by --cuda-device 1 in
Sadly the pytorch and ROCm documentation is a bit misleading in this regard and one would assume that the two env vars are exchangeable but apparently despite the assumption in ROCm documentation (see links below) it's not the case for pytorch
I did change the code myself as suggested above. However there seems to be an issue with my dual-GPU setup as I receive a segfault after the model is loaded and inference is supposed to start. Maybe someone with a similar setup (integrated graphics might work too but I don't have any that's supported by ROCm) can test this.
Expected Behavior
when using e.g. --cuda-device 1 I want ComfyUI to use device with ID 1
Actual Behavior
it doesn't matter what I set in --cuda-device, ComfyUI will use device with ID 0 in any case
Steps to Reproduce
Debug Logs
it's not visible from the logs because there is no error created per se because the log output assumes the operation is complete by setting the environment variable CUDA_VISIBLE_DEVICES which was successful. however I can reproduce the behavior in plain python/torch. See next field for PoC and explanation
Other
For reference I have two AMD RX 7900 XT in my system and when using --cuda-device 1 it's supposed to only expose the 2nd GPU to torch/cuda but since the switch is not working due to the environment variable being ignored by pytorch+rocm it will use the default cuda device which is cuda0 i.e. my 1st GPU not the 2nd. So below is an example showcasing that the correct environment variable yields the expected result.
example setting
CUDA_VISIBLE_DEVICES
to 1 as implemented by --cuda-device 1 inComfyUI/main.py
Line 73 in 2d28b0b
so however when setting
HIP_VISIBLE_DEVICES
instead it will actually workSadly the pytorch and ROCm documentation is a bit misleading in this regard and one would assume that the two env vars are exchangeable but apparently despite the assumption in ROCm documentation (see links below) it's not the case for pytorch
https://pytorch.org/docs/stable/notes/hip.html
https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html#cuda-visible-devices
The text was updated successfully, but these errors were encountered: