Skip to content

Commit

Permalink
How to enable experimental memory efficient attention on ROCm RDNA3.
Browse files Browse the repository at this point in the history
  • Loading branch information
comfyanonymous committed Nov 29, 2024
1 parent 82c5308 commit 20a560e
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,12 @@ For 6700, 6600 and maybe other RDNA2 or older: ```HSA_OVERRIDE_GFX_VERSION=10.3.

For AMD 7600 and maybe other RDNA3 cards: ```HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py```

### AMD ROCm Tips

You can enable experimental memory efficient attention on pytorch 2.5 in ComfyUI on RDNA3 and potentially other AMD GPUs using this command:

```TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 python main.py --use-pytorch-cross-attention```

# Notes

Only parts of the graph that have an output with all the correct inputs will be executed.
Expand Down

1 comment on commit 20a560e

@zlobniyshurik
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to add another piece of advice?

On (pytorch 2.5.1 + rocm 6.2) + 7900XT + flux
PYTORCH_TUNABLEOP_ENABLED=1 PYTORCH_TUNABLEOP_VERBOSE=1 python main.py --novram give me 2.1 s/it
vs
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 python main.py --use-pytorch-cross-attention 3.5s/it

Yes, the first launch will take a long time (when tuning in progress), but subsequent launches are much faster.

Please sign in to comment.