-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.0.17
/ 0.0.18
] nvcc
version and performance
#712
Comments
nvcc
version and performance0.0.17
] nvcc
version and performance
0.0.17
] nvcc
version and performance0.0.17
/ 0.0.18
] nvcc
version and performance
This helps mitigate a performance regression in nvcc>11.6. nvcc 11.8 still performs worse than 11.6, but it's not that bad now See #712 __original_commit__ = fairinternal/xformers@42d55eb5f438ec6907836fbd22056a50076f14d5
This fixes this issue |
Are you able to share post-fix benchmarks, since the commit says it doesn't quite achieve 11.6's performance? Was this determined to be something specific to xformers' usage of cuda, or something that is recommended and being used across facebook projects? If building flash-attention with these same options would be faster, then that may be worth doing now that v2.0.4 fixed Dao-AILab/flash-attention#359 |
@tmm1 I don't have the post-fix benchmarks anymore unfortunately ... The issue was because the PTX optimizer ( |
On A100, we seem to get best performance using nvcc version 11.6.2
We need to investigate why the performance is significatively worse on other nvcc versions
fMHA FW - A100
fMHA BW - A100
We also expect to release a new version
0.0.19
once this is fixed, so that our pre-built binaries have the best performanceThe text was updated successfully, but these errors were encountered: