-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Speed compiling]: Refine cmake about CUDA to automatically detect GPU arch by default. #5713
Conversation
1. Automatically detect GPU arch by default. 2. Specify -DCUDA_ARCH_NAME=All when releasing PaddlePaddle new version
If the code can be compiled for one arch, is it guaranteed to compile for another arch? If not, we should still at least compile it for all arch in TeamCity. We can compile for the local arch on local machine to save dev time though. |
Before cmake, we can get the CUDA architecture of TeamCity machines. |
As @hedaoyuan said, this cmake can automatically detect GPU installed and automatically get the local arch on the local machine by the following code. That is to say, the arch is #############################################################
# A function for automatic detection of GPUs installed (if autodetection is enabled)
# Usage:
# detect_installed_gpus(out_variable)
function(detect_installed_gpus out_variable)
if(NOT CUDA_gpu_detect_output)
set(cufile ${PROJECT_BINARY_DIR}/detect_cuda_archs.cu)
file(WRITE ${cufile} ""
"#include <cstdio>\n"
"int main() {\n"
" int count = 0;\n"
" if (cudaSuccess != cudaGetDeviceCount(&count)) return -1;\n"
" if (count == 0) return -1;\n"
" for (int device = 0; device < count; ++device) {\n"
" cudaDeviceProp prop;\n"
" if (cudaSuccess == cudaGetDeviceProperties(&prop, device))\n"
" std::printf(\"%d.%d \", prop.major, prop.minor);\n"
" }\n"
" return 0;\n"
"}\n")
execute_process(COMMAND "${CUDA_NVCC_EXECUTABLE}" "-ccbin=${CUDA_HOST_COMPILER}"
"--run" "${cufile}"
WORKING_DIRECTORY "${PROJECT_BINARY_DIR}/CMakeFiles/"
RESULT_VARIABLE nvcc_res OUTPUT_VARIABLE nvcc_out
ERROR_QUIET OUTPUT_STRIP_TRAILING_WHITESPACE)
if(nvcc_res EQUAL 0)
# only keep the last line of nvcc_out
STRING(REGEX REPLACE ";" "\\\\;" nvcc_out "${nvcc_out}")
STRING(REGEX REPLACE "\n" ";" nvcc_out "${nvcc_out}")
list(GET nvcc_out -1 nvcc_out)
string(REPLACE "2.1" "2.1(2.0)" nvcc_out "${nvcc_out}")
set(CUDA_gpu_detect_output ${nvcc_out} CACHE INTERNAL "Returned GPU architetures from caffe_detect_gpus tool" FORCE)
endif()
endif()
if(NOT CUDA_gpu_detect_output)
message(STATUS "Automatic GPU detection failed. Building for all known architectures.")
set(${out_variable} ${paddle_known_gpu_archs} PARENT_SCOPE)
else()
set(${out_variable} ${CUDA_gpu_detect_output} PARENT_SCOPE)
endif()
endfunction() And if specify |
As this PR reduces the Teamcity time from (30min~33min) to 24min, can we merge it ASAP? |
What happens if there are both Tesla K40 and GTX 1080 Ti in the machine that compiles the code? |
@chengduoZH The code to detect CUDA capability is as follows, it detects all GPUs on one machine. If there are mixed GPU type on one machine. It also can get all CUDA archs for them. #include <cstdio>
int main() {
int count = 0;
if (cudaSuccess != cudaGetDeviceCount(&count)) return -1;
if (count == 0) return -1;
for (int device = 0; device < count; ++device) {
cudaDeviceProp prop;
if (cudaSuccess == cudaGetDeviceProperties(&prop, device))
std::printf("%d.%d ", prop.major, prop.minor);
}
return 0;
} |
I change the cmake -DCUDA_ARCH_NAME=Auto .. |
I think |
…ve 20 21(20) in cmake/cuda.cmake.
c48999d
to
082bc7a
Compare
@luotao1 It is strange that users compile a Paddle binary and cannot use this binary on another machine by default. So, the default
No, it is not. If developers use some features which introduced in a higher arch, the Paddle cannot be compiled by a lower arch. However, we can specify two arch in our CI tests, the lowest arch Paddle support ( |
Not all, sm_30 only support sm_3x. |
a736990
to
94e8689
Compare
Use |
We also need to document this somewhere. |
How about document in #4382? |
Fix #5712
1. Automatically detect GPU arch and only specify the detected arch by default.- For example, in the Tesla K40m, automatically get and specifysm_35
arch.--DCUDA_ARCH_NAME=All
in the TeamCity.-DCUDA_ARCH_NAME=All
by default.-DCUDA_ARCH_NAME=Auto
Speed:
TeamCity:
local machine: env: centos, cuda 7.5, make -j8, WITH_GPU=ON
Compile time interval: