Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] CUDA 12.5 Build ERROR in MOE/cutlass for sm=90 #20924

Closed
tianleiwu opened this issue Jun 4, 2024 · 0 comments
Closed

[Build] CUDA 12.5 Build ERROR in MOE/cutlass for sm=90 #20924

tianleiwu opened this issue Jun 4, 2024 · 0 comments
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:windows issues related to the Windows platform

Comments

@tianleiwu
Copy link
Contributor

tianleiwu commented Jun 4, 2024

Describe the issue

I tried to build in Windows with CUDA 12.5 for sm=90. It seems that there are build errors.

Urgency

None

Target platform

windows 11

Build script

build.bat --cmake_generator "Visual Studio 17 2022" --config Release --build_wheel --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=61;70;75;80;90" --parallel --build_shared_lib --use_cuda --cuda_version "12.5" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5" --cudnn_home "C:\CuDNN\9.1.1.17_cuda12" --use_tensorrt --tensorrt_home "C:\TensorRT\10.0.1.6.cuda-12.4"

Error / output

E:\git\onnxruntime\build\Windows\Release>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu -I"E:\git\onnxruntime\build\Windows\Release_deps\utf8_range-src" -IE:\git\onnxruntime\include\onnxruntime -IE:
git\onnxruntime\include\onnxruntime\core\session -I"E:\git\onnxruntime\build\Windows\Release_deps\pytorch_cpuinfo-src\include" -IE:\git\onnxruntime\build\Windows\Release -IE:\git\onnxruntime\onnxruntime -I"E:\git\onnxruntime\build\Windows\Release_deps\abseil_cpp-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\safeint-src" -I"E:\git\onnxruntime\bui
ld\Windows\Release_deps\gsl-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\date-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-build" -I"E:\git\onnxruntime\build\Windows\Release_deps\protobuf-src\src" -I"E:\git\onnxruntime\build\Windows\Release_deps\flatbuff
ers-src\include" -IC:\CuDNN\9.1.1.17_cuda12\include -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\examples" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\tools\util\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\eigen-src" -I"C:\TensorRT\10.0.
1.6.cuda-12.4\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\mp11-src\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir onnxrunt.2968DD78\x64\Release -maxrregcount=0 --machine 64 --compile -cudart shared --expt-relaxed-constexpr --
Werror default-stream-launch -Xcudafe --diag_suppress=bad_friend_decl -Xcudafe --diag_suppress=unsigned_compare_with_zero -Xcudafe --diag_suppress=expr_has_no_effect -include algorithm -std=c++17 --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,s
m_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcudafe --diag_suppress=conversion_function_not_usable --threads 1 -Werror all-warnings -Xcompiler="/EHsc -Ob2 -Zi /utf-8 /sdl /experimental:external /external:W0 /external:templates- /external:IE:/git/onnxruntime/cmake /external:IE:/git/o
nnxruntime/build/Windows/Release /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4834 /wd4127" -D_WINDOWS -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS -DNOGDI -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY
_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES -DEIGEN_HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILE
SYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -D_WINDLL -D_MBCS -DEIGEN_HAS_C99_MATH -DCPUINFO_SUPPORTED -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS -DNOGDI -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -
DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES -DEIGEN_HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERI
MENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -Xcompiler "/EHsc /W4 /nologo /O2 /FS /MD /GR" -Xcompiler "/Fdonnxruntime_providers_cuda.dir\Release\vc143.pdb" -o onnxruntime_providers_cuda.dir\Release\moe_gemm_kernels_fp32_fp32.obj "E:\git\onnxruntime\onnxruntime\contrib_ops\cuda\moe\ft_moe\moe_g
emm_kernels_fp32_fp32.cu"
E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(101): error #940-D: missing return statement at end of non-void function "cute::cluster_grid_dims" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^

Remark: The warnings can be suppressed with "-diag-suppress "

E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(120): error #940-D: missing return statement at end of non-void function "cute::cluster_id_in_grid" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^

E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(101): error #940-D: missing return statement at end of non-void function "cute::cluster_grid_dims" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^

Remark: The warnings can be suppressed with "-diag-suppress "

E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(120): error #940-D: missing return statement at end of non-void function "cute::cluster_id_in_grid" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^

E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(101): error #940-D: missing return statement at end of non-void function "cute::cluster_grid_dims" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^

Remark: The warnings can be suppressed with "-diag-suppress "

E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(120): error #940-D: missing return statement at end of non-void function "cute::cluster_id_in_grid" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^

2 errors detected in the compilation of "E:/git/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu".
moe_gemm_kernels_fp16_fp16.cu
2 errors detected in the compilation of "E:/git/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_uint4.cu".
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.5.targets(799,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu
-I"E:\git\onnxruntime\build\Windows\Release_deps\utf8_range-src" -IE:\git\onnxruntime\include\onnxruntime -IE:\git\onnxruntime\include\onnxruntime\core\session -I"E:\git\onnxruntime\build\Windows\Release_deps\pytorch_cpuinfo-src\include" -IE:\git\onnxruntime\build\Windows\Release -IE:\git\onnxruntime\onnxruntime -I"E:\git\onnxruntime\build\Windows\Releas
e_deps\abseil_cpp-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\safeint-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\gsl-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\date-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-build" -I"E:\git\onnxru
ntime\build\Windows\Release_deps\protobuf-src\src" -I"E:\git\onnxruntime\build\Windows\Release_deps\flatbuffers-src\include" -IC:\CuDNN\9.1.1.17_cuda12\include -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\examples" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutla
ss-src\tools\util\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\eigen-src" -I"C:\TensorRT\10.0.1.6.cuda-12.4\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\mp11-src\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir onnxrunt
.2968DD78\x64\Release -maxrregcount=0 --machine 64 --compile -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe --diag_suppress=bad_friend_decl -Xcudafe --diag_suppress=unsigned_compare_with_zero -Xcudafe --diag_suppress=expr_has_no_effect -include algorithm -std=c++17 --generate-code=arch=compute_61,code=[compute_61,sm_61] --
generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcudafe --diag_suppress=conversion_function_not_usable --threads 1 -Werror all-warnings -Xcompiler="/EHsc -Ob2 -Zi /utf-8 /sdl /experimen
tal:external /external:W0 /external:templates- /external:IE:/git/onnxruntime/cmake /external:IE:/git/onnxruntime/build/Windows/Release /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4834 /wd4127" -D_WINDOWS -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS -DNOGD
I -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES -DEIGEN
HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -D_WINDLL -D_MBCS -DEIGEN_HAS_C99_MATH -DCPUINFO_SUPPORTED -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS
-DNOGDI -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES
-DEIGEN_HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -Xcompiler "/EHsc /W4 /nologo /O2 /FS /MD /GR" -Xcompiler "/Fdonnxruntime_providers_cuda.dir\Release\vc143.pdb" -o onnxruntime_providers_cuda.dir\Release\moe

gemm_kernels_fp16_fp16.obj "E:\git\onnxruntime\onnxruntime\contrib_ops\cuda\moe\ft_moe\moe_gemm_kernels_fp16_fp16.cu"" exited with code 2. [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
moe_gemm_kernels_fp16_uint4.cu
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.5.targets(799,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu
-I"E:\git\onnxruntime\build\Windows\Release_deps\utf8_range-src" -IE:\git\onnxruntime\include\onnxruntime -IE:\git\onnxruntime\include\onnxruntime\core\session -I"E:\git\onnxruntime\build\Windows\Release_deps\pytorch_cpuinfo-src\include" -IE:\git\onnxruntime\build\Windows\Release -IE:\git\onnxruntime\onnxruntime -I"E:\git\onnxruntime\build\Windows\Releas
e_deps\abseil_cpp-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\safeint-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\gsl-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\date-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-build" -I"E:\git\onnxru
ntime\build\Windows\Release_deps\protobuf-src\src" -I"E:\git\onnxruntime\build\Windows\Release_deps\flatbuffers-src\include" -IC:\CuDNN\9.1.1.17_cuda12\include -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\examples" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutla
ss-src\tools\util\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\eigen-src" -I"C:\TensorRT\10.0.1.6.cuda-12.4\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\mp11-src\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir onnxrunt
.2968DD78\x64\Release -maxrregcount=0 --machine 64 --compile -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe --diag_suppress=bad_friend_decl -Xcudafe --diag_suppress=unsigned_compare_with_zero -Xcudafe --diag_suppress=expr_has_no_effect -include algorithm -std=c++17 --generate-code=arch=compute_61,code=[compute_61,sm_61] --
generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcudafe --diag_suppress=conversion_function_not_usable --threads 1 -Werror all-warnings -Xcompiler="/EHsc -Ob2 -Zi /utf-8 /sdl /experimen
tal:external /external:W0 /external:templates- /external:IE:/git/onnxruntime/cmake /external:IE:/git/onnxruntime/build/Windows/Release /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4834 /wd4127" -D_WINDOWS -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS -DNOGD
I -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES -DEIGEN
HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -D_WINDLL -D_MBCS -DEIGEN_HAS_C99_MATH -DCPUINFO_SUPPORTED -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS
-DNOGDI -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES
-DEIGEN_HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -Xcompiler "/EHsc /W4 /nologo /O2 /FS /MD /GR" -Xcompiler "/Fdonnxruntime_providers_cuda.dir\Release\vc143.pdb" -o onnxruntime_providers_cuda.dir\Release\moe

gemm_kernels_fp16_uint4.obj "E:\git\onnxruntime\onnxruntime\contrib_ops\cuda\moe\ft_moe\moe_gemm_kernels_fp16_uint4.cu"" exited with code 2. [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
2 errors detected in the compilation of "E:/git/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp32_fp32.cu".
moe_gemm_kernels_fp32_fp32.cu
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.5.targets(799,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu
-I"E:\git\onnxruntime\build\Windows\Release_deps\utf8_range-src" -IE:\git\onnxruntime\include\onnxruntime -IE:\git\onnxruntime\include\onnxruntime\core\session -I"E:\git\onnxruntime\build\Windows\Release_deps\pytorch_cpuinfo-src\include" -IE:\git\onnxruntime\build\Windows\Release -IE:\git\onnxruntime\onnxruntime -I"E:\git\onnxruntime\build\Windows\Releas
e_deps\abseil_cpp-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\safeint-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\gsl-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\date-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-build" -I"E:\git\onnxru
ntime\build\Windows\Release_deps\protobuf-src\src" -I"E:\git\onnxruntime\build\Windows\Release_deps\flatbuffers-src\include" -IC:\CuDNN\9.1.1.17_cuda12\include -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\examples" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutla
ss-src\tools\util\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\eigen-src" -I"C:\TensorRT\10.0.1.6.cuda-12.4\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\mp11-src\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir onnxrunt
.2968DD78\x64\Release -maxrregcount=0 --machine 64 --compile -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe --diag_suppress=bad_friend_decl -Xcudafe --diag_suppress=unsigned_compare_with_zero -Xcudafe --diag_suppress=expr_has_no_effect -include algorithm -std=c++17 --generate-code=arch=compute_61,code=[compute_61,sm_61] --
generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcudafe --diag_suppress=conversion_function_not_usable --threads 1 -Werror all-warnings -Xcompiler="/EHsc -Ob2 -Zi /utf-8 /sdl /experimen
tal:external /external:W0 /external:templates- /external:IE:/git/onnxruntime/cmake /external:IE:/git/onnxruntime/build/Windows/Release /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4834 /wd4127" -D_WINDOWS -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS -DNOGD
I -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES -DEIGEN
HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -D_WINDLL -D_MBCS -DEIGEN_HAS_C99_MATH -DCPUINFO_SUPPORTED -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS
-DNOGDI -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES
-DEIGEN_HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -Xcompiler "/EHsc /W4 /nologo /O2 /FS /MD /GR" -Xcompiler "/Fdonnxruntime_providers_cuda.dir\Release\vc143.pdb" -o onnxruntime_providers_cuda.dir\Release\moe

gemm_kernels_fp32_fp32.obj "E:\git\onnxruntime\onnxruntime\contrib_ops\cuda\moe\ft_moe\moe_gemm_kernels_fp32_fp32.cu"" exited with code 2. [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]

Visual Studio Version

Enterprise 2022 (64-bit) 17.9.7

GCC / Compiler Version

No response

@tianleiwu tianleiwu added the build build issues; typically submitted using template label Jun 4, 2024
@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:windows issues related to the Windows platform labels Jun 4, 2024
tianleiwu added a commit that referenced this issue Jun 11, 2024
### Description
Upgrade cutlass to 3.5 to fix build errors using CUDA 12.4 or 12.5 in
Windows
- [x] Upgrade cutlass to 3.5.0.
- [x] Fix flash attention build error with latest cutlass header files
and APIs. This fix is provided by @wangyems.
- [x] Update efficient attention to use new cutlass fmha interface.
- [x] Patch cutlass to fix `hrsqrt` not found error for sm < 53.
- [x] Disable TF32 Staged Accumulation to fix blkq4_fp16_gemm_sm80_test
build error for cuda 11.8 to 12.3.
- [x] Disable TRT 10 deprecate warnings. 

The following are not included in this PR:
* TRT provider replaces the deprecated APIs.
* Fix blkq4_fp16_gemm_sm80_test build error for cuda 12.4 or 12.5. This
test is not built by default unless you add `--cmake_extra_defines
onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON` in build command.

To integrate to rel-1.18.1: Either bring in other changes (like onnx
1.16.1), or generate manifest and upload a new ONNX Runtime Build Time
Deps artifact based on rel-1.18.1.

### Motivation and Context
#19891
#20924
#20953
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

1 participant