[Build] CUDA 12.5 Build ERROR in MOE/cutlass for sm=90 #20924
Labels
build
build issues; typically submitted using template
ep:CUDA
issues related to the CUDA execution provider
ep:TensorRT
issues related to TensorRT execution provider
platform:windows
issues related to the Windows platform
Describe the issue
I tried to build in Windows with CUDA 12.5 for sm=90. It seems that there are build errors.
Urgency
None
Target platform
windows 11
Build script
build.bat --cmake_generator "Visual Studio 17 2022" --config Release --build_wheel --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=61;70;75;80;90" --parallel --build_shared_lib --use_cuda --cuda_version "12.5" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5" --cudnn_home "C:\CuDNN\9.1.1.17_cuda12" --use_tensorrt --tensorrt_home "C:\TensorRT\10.0.1.6.cuda-12.4"
Error / output
E:\git\onnxruntime\build\Windows\Release>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu -I"E:\git\onnxruntime\build\Windows\Release_deps\utf8_range-src" -IE:\git\onnxruntime\include\onnxruntime -IE:
git\onnxruntime\include\onnxruntime\core\session -I"E:\git\onnxruntime\build\Windows\Release_deps\pytorch_cpuinfo-src\include" -IE:\git\onnxruntime\build\Windows\Release -IE:\git\onnxruntime\onnxruntime -I"E:\git\onnxruntime\build\Windows\Release_deps\abseil_cpp-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\safeint-src" -I"E:\git\onnxruntime\bui
ld\Windows\Release_deps\gsl-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\date-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-build" -I"E:\git\onnxruntime\build\Windows\Release_deps\protobuf-src\src" -I"E:\git\onnxruntime\build\Windows\Release_deps\flatbuff
ers-src\include" -IC:\CuDNN\9.1.1.17_cuda12\include -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\examples" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\tools\util\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\eigen-src" -I"C:\TensorRT\10.0.
1.6.cuda-12.4\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\mp11-src\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir onnxrunt.2968DD78\x64\Release -maxrregcount=0 --machine 64 --compile -cudart shared --expt-relaxed-constexpr --
Werror default-stream-launch -Xcudafe --diag_suppress=bad_friend_decl -Xcudafe --diag_suppress=unsigned_compare_with_zero -Xcudafe --diag_suppress=expr_has_no_effect -include algorithm -std=c++17 --generate-code=arch=compute_61,code=[compute_61,sm_61] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,s
m_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcudafe --diag_suppress=conversion_function_not_usable --threads 1 -Werror all-warnings -Xcompiler="/EHsc -Ob2 -Zi /utf-8 /sdl /experimental:external /external:W0 /external:templates- /external:IE:/git/onnxruntime/cmake /external:IE:/git/o
nnxruntime/build/Windows/Release /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4834 /wd4127" -D_WINDOWS -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS -DNOGDI -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY
_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES -DEIGEN_HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILE
SYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -D_WINDLL -D_MBCS -DEIGEN_HAS_C99_MATH -DCPUINFO_SUPPORTED -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS -DNOGDI -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -
DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES -DEIGEN_HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERI
MENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -Xcompiler "/EHsc /W4 /nologo /O2 /FS /MD /GR" -Xcompiler "/Fdonnxruntime_providers_cuda.dir\Release\vc143.pdb" -o onnxruntime_providers_cuda.dir\Release\moe_gemm_kernels_fp32_fp32.obj "E:\git\onnxruntime\onnxruntime\contrib_ops\cuda\moe\ft_moe\moe_g
emm_kernels_fp32_fp32.cu"
E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(101): error #940-D: missing return statement at end of non-void function "cute::cluster_grid_dims" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^
Remark: The warnings can be suppressed with "-diag-suppress "
E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(120): error #940-D: missing return statement at end of non-void function "cute::cluster_id_in_grid" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^
E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(101): error #940-D: missing return statement at end of non-void function "cute::cluster_grid_dims" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^
Remark: The warnings can be suppressed with "-diag-suppress "
E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(120): error #940-D: missing return statement at end of non-void function "cute::cluster_id_in_grid" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^
E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(101): error #940-D: missing return statement at end of non-void function "cute::cluster_grid_dims" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^
Remark: The warnings can be suppressed with "-diag-suppress "
E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include\cute/arch/cluster_sm90.hpp(120): error #940-D: missing return statement at end of non-void function "cute::cluster_id_in_grid" [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
}
^
2 errors detected in the compilation of "E:/git/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_fp16.cu".
moe_gemm_kernels_fp16_fp16.cu
2 errors detected in the compilation of "E:/git/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp16_uint4.cu".
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.5.targets(799,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu
-I"E:\git\onnxruntime\build\Windows\Release_deps\utf8_range-src" -IE:\git\onnxruntime\include\onnxruntime -IE:\git\onnxruntime\include\onnxruntime\core\session -I"E:\git\onnxruntime\build\Windows\Release_deps\pytorch_cpuinfo-src\include" -IE:\git\onnxruntime\build\Windows\Release -IE:\git\onnxruntime\onnxruntime -I"E:\git\onnxruntime\build\Windows\Releas
e_deps\abseil_cpp-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\safeint-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\gsl-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\date-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-build" -I"E:\git\onnxru
ntime\build\Windows\Release_deps\protobuf-src\src" -I"E:\git\onnxruntime\build\Windows\Release_deps\flatbuffers-src\include" -IC:\CuDNN\9.1.1.17_cuda12\include -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\examples" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutla
ss-src\tools\util\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\eigen-src" -I"C:\TensorRT\10.0.1.6.cuda-12.4\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\mp11-src\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir onnxrunt
.2968DD78\x64\Release -maxrregcount=0 --machine 64 --compile -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe --diag_suppress=bad_friend_decl -Xcudafe --diag_suppress=unsigned_compare_with_zero -Xcudafe --diag_suppress=expr_has_no_effect -include algorithm -std=c++17 --generate-code=arch=compute_61,code=[compute_61,sm_61] --
generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcudafe --diag_suppress=conversion_function_not_usable --threads 1 -Werror all-warnings -Xcompiler="/EHsc -Ob2 -Zi /utf-8 /sdl /experimen
tal:external /external:W0 /external:templates- /external:IE:/git/onnxruntime/cmake /external:IE:/git/onnxruntime/build/Windows/Release /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4834 /wd4127" -D_WINDOWS -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS -DNOGD
I -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES -DEIGEN
HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -D_WINDLL -D_MBCS -DEIGEN_HAS_C99_MATH -DCPUINFO_SUPPORTED -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS
-DNOGDI -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES
-DEIGEN_HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -Xcompiler "/EHsc /W4 /nologo /O2 /FS /MD /GR" -Xcompiler "/Fdonnxruntime_providers_cuda.dir\Release\vc143.pdb" -o onnxruntime_providers_cuda.dir\Release\moe
gemm_kernels_fp16_fp16.obj "E:\git\onnxruntime\onnxruntime\contrib_ops\cuda\moe\ft_moe\moe_gemm_kernels_fp16_fp16.cu"" exited with code 2. [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
moe_gemm_kernels_fp16_uint4.cu
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.5.targets(799,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu
-I"E:\git\onnxruntime\build\Windows\Release_deps\utf8_range-src" -IE:\git\onnxruntime\include\onnxruntime -IE:\git\onnxruntime\include\onnxruntime\core\session -I"E:\git\onnxruntime\build\Windows\Release_deps\pytorch_cpuinfo-src\include" -IE:\git\onnxruntime\build\Windows\Release -IE:\git\onnxruntime\onnxruntime -I"E:\git\onnxruntime\build\Windows\Releas
e_deps\abseil_cpp-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\safeint-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\gsl-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\date-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-build" -I"E:\git\onnxru
ntime\build\Windows\Release_deps\protobuf-src\src" -I"E:\git\onnxruntime\build\Windows\Release_deps\flatbuffers-src\include" -IC:\CuDNN\9.1.1.17_cuda12\include -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\examples" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutla
ss-src\tools\util\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\eigen-src" -I"C:\TensorRT\10.0.1.6.cuda-12.4\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\mp11-src\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir onnxrunt
.2968DD78\x64\Release -maxrregcount=0 --machine 64 --compile -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe --diag_suppress=bad_friend_decl -Xcudafe --diag_suppress=unsigned_compare_with_zero -Xcudafe --diag_suppress=expr_has_no_effect -include algorithm -std=c++17 --generate-code=arch=compute_61,code=[compute_61,sm_61] --
generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcudafe --diag_suppress=conversion_function_not_usable --threads 1 -Werror all-warnings -Xcompiler="/EHsc -Ob2 -Zi /utf-8 /sdl /experimen
tal:external /external:W0 /external:templates- /external:IE:/git/onnxruntime/cmake /external:IE:/git/onnxruntime/build/Windows/Release /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4834 /wd4127" -D_WINDOWS -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS -DNOGD
I -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES -DEIGEN
HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -D_WINDLL -D_MBCS -DEIGEN_HAS_C99_MATH -DCPUINFO_SUPPORTED -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS
-DNOGDI -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES
-DEIGEN_HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -Xcompiler "/EHsc /W4 /nologo /O2 /FS /MD /GR" -Xcompiler "/Fdonnxruntime_providers_cuda.dir\Release\vc143.pdb" -o onnxruntime_providers_cuda.dir\Release\moe
gemm_kernels_fp16_uint4.obj "E:\git\onnxruntime\onnxruntime\contrib_ops\cuda\moe\ft_moe\moe_gemm_kernels_fp16_uint4.cu"" exited with code 2. [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
2 errors detected in the compilation of "E:/git/onnxruntime/onnxruntime/contrib_ops/cuda/moe/ft_moe/moe_gemm_kernels_fp32_fp32.cu".
moe_gemm_kernels_fp32_fp32.cu
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.5.targets(799,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu
-I"E:\git\onnxruntime\build\Windows\Release_deps\utf8_range-src" -IE:\git\onnxruntime\include\onnxruntime -IE:\git\onnxruntime\include\onnxruntime\core\session -I"E:\git\onnxruntime\build\Windows\Release_deps\pytorch_cpuinfo-src\include" -IE:\git\onnxruntime\build\Windows\Release -IE:\git\onnxruntime\onnxruntime -I"E:\git\onnxruntime\build\Windows\Releas
e_deps\abseil_cpp-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\safeint-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\gsl-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\date-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-src" -I"E:\git\onnxruntime\build\Windows\Release_deps\onnx-build" -I"E:\git\onnxru
ntime\build\Windows\Release_deps\protobuf-src\src" -I"E:\git\onnxruntime\build\Windows\Release_deps\flatbuffers-src\include" -IC:\CuDNN\9.1.1.17_cuda12\include -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutlass-src\examples" -I"E:\git\onnxruntime\build\Windows\Release_deps\cutla
ss-src\tools\util\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\eigen-src" -I"C:\TensorRT\10.0.1.6.cuda-12.4\include" -I"E:\git\onnxruntime\build\Windows\Release_deps\mp11-src\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include" --keep-dir onnxrunt
.2968DD78\x64\Release -maxrregcount=0 --machine 64 --compile -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe --diag_suppress=bad_friend_decl -Xcudafe --diag_suppress=unsigned_compare_with_zero -Xcudafe --diag_suppress=expr_has_no_effect -include algorithm -std=c++17 --generate-code=arch=compute_61,code=[compute_61,sm_61] --
generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcudafe --diag_suppress=conversion_function_not_usable --threads 1 -Werror all-warnings -Xcompiler="/EHsc -Ob2 -Zi /utf-8 /sdl /experimen
tal:external /external:W0 /external:templates- /external:IE:/git/onnxruntime/cmake /external:IE:/git/onnxruntime/build/Windows/Release /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4251 /wd4201 /wd4324 /wd5054 /w15038 /wd4834 /wd4127" -D_WINDOWS -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS -DNOGD
I -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES -DEIGEN
HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -D_WINDLL -D_MBCS -DEIGEN_HAS_C99_MATH -DCPUINFO_SUPPORTED -DNDEBUG -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_USE_THREADS -DDISABLE_CUSPARSE_DEPRECATED -DPLATFORM_WINDOWS
-DNOGDI -DNOMINMAX -D_USE_MATH_DEFINES -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -DUSE_CUDA=1 -DUSE_MEMORY_EFFICIENT_ATTENTION=1 -DUSE_TENSORRT=1 -DONLY_C_LOCALE=0 -DONNX_NAMESPACE=onnx -DONNX_ML=1 -DONNX_USE_LITE_PROTO=1 -D__ONNX_NO_DOC_STRINGS -DWIN32_LEAN_AND_MEAN -DORT_ENABLE_STREAM -DEIGEN_MPL2_ONLY -DEIGEN_HAS_CONSTEXPR -DEIGEN_HAS_VARIADIC_TEMPLATES
-DEIGEN_HAS_CXX11_MATH -DEIGEN_HAS_CXX11_ATOMIC -DEIGEN_STRONG_INLINE=inline -D_SILENCE_EXPERIMENTAL_FILESYSTEM_DEPRECATION_WARNING=1 -D"CMAKE_INTDIR="Release"" -Donnxruntime_providers_cuda_EXPORTS -Xcompiler "/EHsc /W4 /nologo /O2 /FS /MD /GR" -Xcompiler "/Fdonnxruntime_providers_cuda.dir\Release\vc143.pdb" -o onnxruntime_providers_cuda.dir\Release\moe
gemm_kernels_fp32_fp32.obj "E:\git\onnxruntime\onnxruntime\contrib_ops\cuda\moe\ft_moe\moe_gemm_kernels_fp32_fp32.cu"" exited with code 2. [E:\git\onnxruntime\build\Windows\Release\onnxruntime_providers_cuda.vcxproj]
Visual Studio Version
Enterprise 2022 (64-bit) 17.9.7
GCC / Compiler Version
No response
The text was updated successfully, but these errors were encountered: