Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest daily build error when building nccl test using manylinux docker image #8635

Closed
typhoonzero opened this issue Feb 28, 2018 · 6 comments
Assignees

Comments

@typhoonzero
Copy link
Contributor

See: https://paddleci.ngrok.io/viewLog.html?buildId=29125&buildTypeId=Manylinux1_Cuda75cudnn5cp27cp27mu&tab=buildResultsDiv

@QiJune
Copy link
Member

QiJune commented Feb 28, 2018

It seems that nccl2 require CUDA 8.0 at least.

Ensure your environment meets the following software requirements:
glibc 2.19 or higher
CUDA 8.0 or higher

Read more at: http://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html#ixzz58NdnOeoU
Follow us: @gpucomputing on Twitter | NVIDIA on Facebook

@typhoonzero
Copy link
Contributor Author

Seems CUDA 8.0 build also fails, important logs, seems ncclGroupStart must use nccl2 header to compile, while manylinux docker image installs nccl1 currently, and since manylinux is based on centos6, nccl2 does not provide yum install package either.

[02:44:17] :	 [Step 4/4] [ 55%] Building NVCC (Device) object paddle/fluid/operators/CMakeFiles/cross_entropy_op.dir/cross_entropy_op_generated_cross_entropy_op.cu.o
[02:44:18]W:	 [Step 4/4] /paddle/paddle/fluid/platform/nccl_test.cu(92): error: no instance of function template "paddle::platform::dynload::DynLoad__ncclGroupStart::operator()" matches the argument list
[02:44:18]W:	 [Step 4/4]             object type is: paddle::platform::dynload::DynLoad__ncclGroupStart
[02:44:18]W:	 [Step 4/4] 
[02:44:18]W:	 [Step 4/4] /paddle/paddle/fluid/platform/nccl_test.cu(101): error: no instance of function template "paddle::platform::dynload::DynLoad__ncclGroupEnd::operator()" matches the argument list
[02:44:18]W:	 [Step 4/4]             object type is: paddle::platform::dynload::DynLoad__ncclGroupEnd

@Yancey1989
Copy link
Contributor

So shall we stop support CUDA 7.5 and above version with wheel package?

@luotao1
Copy link
Contributor

luotao1 commented Mar 1, 2018

Many servers in company support CUDA 7.5 only.

@Yancey1989
Copy link
Contributor

We have already upgrade nccl to nccl2 since #8540, it means users can't use the latest PaddlePaddle with CUDA 7.5.

@luotao1
Copy link
Contributor

luotao1 commented Mar 1, 2018

#8504 (comment) @dzhwinter said "To make our Multi-GPU supported in more platform, we still need the nccl1 to compatible with older CUDA version."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants