[CUDA] upgrade cutlass to 3.5.0 #20940

tianleiwu · 2024-06-05T17:44:57Z

Description

Upgrade cutlass to 3.5 to fix build errors using CUDA 12.4 or 12.5 in Windows

Upgrade cutlass to 3.5.0.
Fix flash attention build error with latest cutlass header files and APIs. This fix is provided by @wangyems.
Update efficient attention to use new cutlass fmha interface.
Patch cutlass to fix hrsqrt not found error for sm < 53.
Disable TF32 Staged Accumulation to fix blkq4_fp16_gemm_sm80_test build error for cuda 11.8 to 12.3.
Disable TRT 10 deprecate warnings.

The following are not included in this PR:

TRT provider replaces the deprecated APIs.
Fix blkq4_fp16_gemm_sm80_test build error for cuda 12.4 or 12.5. This test is not built by default unless you add --cmake_extra_defines onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON in build command.

To integrate to rel-1.18.1: Either bring in other changes (like onnx 1.16.1), or generate manifest and upload a new ONNX Runtime Build Time Deps artifact based on rel-1.18.1.

Motivation and Context

#19891
#20924
#20953

cmake/CMakeLists.txt

onnxruntime/core/mickey/cutlass_ext/q4gemm/threadblock/quantb_mma_multistage.h

onnxruntime/contrib_ops/cuda/bert/cutlass_fmha/fmha_launch_template.h

onnxruntime/core/mickey/cutlass_ext/q4gemm/threadblock/quantb_mma_multistage.h

snnn

Do not disable C4996. See https://github.com/microsoft/onnxruntime/blob/main/docs/Coding_Conventions_and_Standards.md

If the warning was generated when compiling an external *.cc/*.cpp file that is not part of our source tree, usually it wouldn't cause a build failure since we do not treat some warnings as errors
Otherwise, use tricks like https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/common/eigen_common_wrapper.h

Thanks for the fix!

tianleiwu added 4 commits June 4, 2024 18:01

Add /Zc:__cplusplus

dd964cb

update cutlass

e31edf7

Add code to use batch hook

4ee7731

update cgmanifest

94a3b2e

tianleiwu requested review from a team as code owners June 5, 2024 17:44

tianleiwu marked this pull request as draft June 5, 2024 17:45

snnn reviewed Jun 5, 2024

View reviewed changes

cmake/CMakeLists.txt Show resolved Hide resolved

limit max head size = 1024

fd89bb9

yufenglee requested a review from aciddelgado June 5, 2024 19:36

tianleiwu added 6 commits June 5, 2024 22:09

fix linux build

be8f3c6

use GQAToBatchHook

02886c2

cutlass patch to fix hrsqrt not found for SM<53

0f460d1

suppress TRT deprecated warnings

ed63a78

undo to_batch_hook

274862b

suppress trt deprecate warning and clean up

8d5c4c0

tianleiwu marked this pull request as ready for review June 7, 2024 04:17

tianleiwu added the release:1.18.1 label Jun 7, 2024

tianleiwu commented Jun 7, 2024

View reviewed changes

onnxruntime/core/mickey/cutlass_ext/q4gemm/threadblock/quantb_mma_multistage.h Show resolved Hide resolved

tianleiwu requested a review from wangyems June 9, 2024 00:24

wangyems reviewed Jun 10, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/cutlass_fmha/fmha_launch_template.h Show resolved Hide resolved

wangyems reviewed Jun 10, 2024

View reviewed changes

onnxruntime/core/mickey/cutlass_ext/q4gemm/threadblock/quantb_mma_multistage.h Show resolved Hide resolved

snnn previously requested changes Jun 10, 2024

View reviewed changes

address review feedback

a02c612

fix more 4996 warnings

415c5e1

tianleiwu requested review from wangyems and chilo-ms June 10, 2024 23:15

tianleiwu force-pushed the tlwu/fix_cutlass_msvc_build_error branch from 6adb2cd to 415c5e1 Compare June 10, 2024 23:50

wangyems approved these changes Jun 11, 2024

View reviewed changes

snnn approved these changes Jun 11, 2024

View reviewed changes

tianleiwu requested a review from pranavsharma June 11, 2024 03:51

faxu approved these changes Jun 11, 2024

View reviewed changes

tianleiwu merged commit b3fc9b5 into main Jun 11, 2024
108 checks passed

tianleiwu deleted the tlwu/fix_cutlass_msvc_build_error branch June 11, 2024 20:32

sophies927 added the triage:approved Approved for cherrypicks for release label Jun 11, 2024

jywu-msft added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider and removed triage:approved Approved for cherrypicks for release release:1.18.1 labels Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] upgrade cutlass to 3.5.0 #20940

[CUDA] upgrade cutlass to 3.5.0 #20940

tianleiwu commented Jun 5, 2024 •

edited

Loading

snnn left a comment •

edited

Loading

[CUDA] upgrade cutlass to 3.5.0 #20940

[CUDA] upgrade cutlass to 3.5.0 #20940

Conversation

tianleiwu commented Jun 5, 2024 • edited Loading

Description

Motivation and Context

snnn left a comment • edited Loading

Choose a reason for hiding this comment

tianleiwu commented Jun 5, 2024 •

edited

Loading

snnn left a comment •

edited

Loading