Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile error: fatal error: nccl.h: No such file or directory #5035

Closed
chengduoZH opened this issue Oct 24, 2017 · 7 comments · Fixed by #5036 or #9833
Closed

Compile error: fatal error: nccl.h: No such file or directory #5035

chengduoZH opened this issue Oct 24, 2017 · 7 comments · Fixed by #5036 or #9833

Comments

@chengduoZH
Copy link
Contributor

chengduoZH commented Oct 24, 2017

Error:

In file included from /home/zhaochengduo/program/temp_master_paddle/Paddle/paddle/platform/enforce.h:39:0,
                 from /home/zhaochengduo/program/temp_master_paddle/Paddle/paddle/platform/gpu_info.cc:19:
/home/zhaochengduo/program/temp_master_paddle/Paddle/paddle/platform/dynload/nccl.h:18:18: fatal error: nccl.h: No such file or directory
 #include <nccl.h>
                  ^
compilation terminated.
make64[2]: *** [paddle/platform/CMakeFiles/gpu_info.dir/gpu_info.cc.o] Error 1
make64[1]: *** [paddle/platform/CMakeFiles/gpu_info.dir/all] Error 2
make64[1]: *** Waiting for unfinished jobs....

Environmental:

host system: centos, version: 6.3
CUDA 7.5, libcudnn.so.5.1.3  or CUDA 8.0, libcudnn.so.6.0.21    
@luotao1
Copy link
Contributor

luotao1 commented Oct 24, 2017

The same error for me

@chengduoZH
Copy link
Contributor Author

Administrators should install NCCL library on the server.

@luotao1
Copy link
Contributor

luotao1 commented Oct 24, 2017

I change

-#include <nccl.h>
 #include <mutex>
+#include "nccl.h"

but there are another mistake:

In file included from /home/luotao02/Paddle/paddle/platform/gpu_info.cc:19:0:
/home/luotao02/Paddle/paddle/platform/enforce.h:180:5: error: ‘paddle::platform::throw_on_error’ declared as an ‘inline’ variable
     ncclResult_t stat, const Args&... args) {
     ^
/home/luotao02/Paddle/paddle/platform/enforce.h:180:5: error: template declaration of ‘typename std::enable_if<(sizeof (Args ...) != 0), void>::type paddle::platform::throw_on_error’
/home/luotao02/Paddle/paddle/platform/enforce.h:180:5: error: ‘ncclResult_t’ was not declared in this scope
/home/luotao02/Paddle/paddle/platform/enforce.h:180:24: error: expected primary-expression before ‘const’
     ncclResult_t stat, const Args&... args) {
                        ^
make64[2]: *** [paddle/platform/CMakeFiles/gpu_info.dir/gpu_info.cc.o] Error 1
make64[1]: *** [paddle/platform/CMakeFiles/gpu_info.dir/all] Error 2
make64[1]: *** Waiting for unfinished jobs....

@luotao1
Copy link
Contributor

luotao1 commented Oct 24, 2017

Administrators should install NCCL library on the server.

NCCL is installed using third_party

@chengduoZH
Copy link
Contributor Author

NCCL is installed using third_party

I see.

@yu239-zz
Copy link

yu239-zz commented Feb 23, 2018

The error shows up again. NCCL install has been removed from the CMake. If I download the source and compile myself, nccl will not be found. The docker version is fine, however.

@wangkuiyi
Copy link
Collaborator

I see that in #8540 we install NCCL in the Dockerfile, so the Docker-based build is alright.

I understand that we have to install NCCL manually, or via Dockerfile, but no longer via cmake/external/nccl.cmake, because the latter choice only works with open sourced dependencies.

So, the correct way to build PaddlePaddle is to use the Docker image. Or, a secondary solution is to manually install it. However, manually installation must make sure that the build environment and the runtime environment have the same version of NCCL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants