-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows build of xformers cannot work on pytorch>=2.2 now. #1073
Comments
related issue: #1069 USE_FLASH_ATTENTION was not enabled for build is pytorch side error but it means xformers SHOULD NOT ASSUME user in windows will have flash attn. |
this is why i am getting error please fix it :( so latest we can use is 2.2.0 and xformers 0.0.24.post1 ? |
2.3.0 and xformers 0.0.26.post1 works well, since this "feature" hadn't been brought in. |
thanks i am gonna test 2.3.0 and 0.0.26.post1 |
Sorry about this, indeed it looks like PyTorch has never supported building FlashAttention v2 on Windows, and this occurred in the 2.2.0 release (pytorch/pytorch#105602). We're looking into re-enabling FlashAttention in xFormers just for Windows. However, this will likely be a temporary fix. We'll see whether we can get PyTorch to ship it by default, but I'd also recommend that you look into whether you have to use FlashAttention, or whether you can switch to some of the other backends provided by PyTorch on Windows. |
thanks a lot @lw |
I will say it is definitely possible for me to run things on cutlass attn (from pt) But it also means I need to reimplement all the related things from diffusers to get it work right. With worse performance (speed term) as result. Which looks non-sense for me. Also, this issue indicate that the flashattn-pt detect method cannot check if pytorch is compiled with Flash Attn or not correctly |
Right, this is what we're currently investigating. Because if that worked as intended then we wouldn't be having this issue. We don't have access to any Windows machine to debug. Could you please help us by installing PyTorch 2.4.0 and xFormers 0.0.27.post1 and give us the output of these commands?
Thanks! |
|
I just triggered a new release, v0.0.27.post2, which should include FlashAttention in its Windows builds. Moreover, I'm trying to make PyTorch include FlashAttention on Windows by default, so that in the future you won't have to depend on xFormers: pytorch/pytorch#131875 |
@lw awesome thank you so much I wish shameless OpenAI were following you They still don't support Triton and thus I can't use Cogvlm v2 on Windows |
We have some fork for triton which for windows build to work |
can i get that fork please really needed. i got 2.1 triton pre compiled wheel but looks like triton 3 is mandatory for Cog VLM |
|
🐛 Bug
In the last release of xformers (0.0.27.post1)
Xformers introduce a feature which use flash_attn package and pytorch's builtin SDP to reduce size/compile time.
The problem is this behavior affect the windows platform which:
Basically it means xformers is the ONLY ONE flash attn implementation that have windows pre-built wheel. But now it drop the support.
Since we just need to modify some env param to let the compile process to actually compile it.
I think this is actually a bug. not a question/help or feature request.
Command
using
python -m xformers.info
will found that xformers are using "flashattF@2.5.6-pt" which is ACTUALLY NOT SUPPORTEDTo Reproduce
ANY scripts utilizing xformers' flash attention on Windows platform in 0.0.27.post1 version.
Expected behavior
the prebuilt wheel of xformers should have flashattn/cutlass compiled, not just import pytorch one.
Environment
Additional context
I can ensure built from source xformers is working normally. With XFORMERS_PT_CUTLASS_ATTN/XFORMERS_PT_FLASH_ATTN set to 0
The text was updated successfully, but these errors were encountered: