Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] upgrade cutlass to 3.5.0 #20940

Merged
merged 13 commits into from
Jun 11, 2024
Merged

Conversation

tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Jun 5, 2024

Description

Upgrade cutlass to 3.5 to fix build errors using CUDA 12.4 or 12.5 in Windows

  • Upgrade cutlass to 3.5.0.
  • Fix flash attention build error with latest cutlass header files and APIs. This fix is provided by @wangyems.
  • Update efficient attention to use new cutlass fmha interface.
  • Patch cutlass to fix hrsqrt not found error for sm < 53.
  • Disable TF32 Staged Accumulation to fix blkq4_fp16_gemm_sm80_test build error for cuda 11.8 to 12.3.
  • Disable TRT 10 deprecate warnings.

The following are not included in this PR:

  • TRT provider replaces the deprecated APIs.
  • Fix blkq4_fp16_gemm_sm80_test build error for cuda 12.4 or 12.5. This test is not built by default unless you add --cmake_extra_defines onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON in build command.

To integrate to rel-1.18.1: Either bring in other changes (like onnx 1.16.1), or generate manifest and upload a new ONNX Runtime Build Time Deps artifact based on rel-1.18.1.

Motivation and Context

#19891
#20924
#20953

@tianleiwu tianleiwu requested review from a team as code owners June 5, 2024 17:44
@tianleiwu tianleiwu marked this pull request as draft June 5, 2024 17:45
@yufenglee yufenglee requested a review from aciddelgado June 5, 2024 19:36
@tianleiwu tianleiwu marked this pull request as ready for review June 7, 2024 04:17
@tianleiwu tianleiwu requested a review from wangyems June 9, 2024 00:24
snnn
snnn previously requested changes Jun 10, 2024
Copy link
Member

@snnn snnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not disable C4996. See https://github.com/microsoft/onnxruntime/blob/main/docs/Coding_Conventions_and_Standards.md

If the warning was generated when compiling an external *.cc/*.cpp file that is not part of our source tree, usually it wouldn't cause a build failure since we do not treat some warnings as errors
Otherwise, use tricks like https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/common/eigen_common_wrapper.h

@snnn snnn dismissed their stale review June 10, 2024 20:18

Thanks for the fix!

@tianleiwu tianleiwu requested review from wangyems and chilo-ms June 10, 2024 23:15
@tianleiwu tianleiwu force-pushed the tlwu/fix_cutlass_msvc_build_error branch from 6adb2cd to 415c5e1 Compare June 10, 2024 23:50
@tianleiwu tianleiwu requested a review from pranavsharma June 11, 2024 03:51
@tianleiwu tianleiwu merged commit b3fc9b5 into main Jun 11, 2024
108 checks passed
@tianleiwu tianleiwu deleted the tlwu/fix_cutlass_msvc_build_error branch June 11, 2024 20:32
@sophies927 sophies927 added the triage:approved Approved for cherrypicks for release label Jun 11, 2024
@jywu-msft jywu-msft added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider and removed triage:approved Approved for cherrypicks for release release:1.18.1 labels Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants