Latest daily build error when building nccl test using manylinux docker image #8635

typhoonzero · 2018-02-28T05:59:39Z

See: https://paddleci.ngrok.io/viewLog.html?buildId=29125&buildTypeId=Manylinux1_Cuda75cudnn5cp27cp27mu&tab=buildResultsDiv

QiJune · 2018-02-28T06:07:42Z

It seems that nccl2 require CUDA 8.0 at least.

Ensure your environment meets the following software requirements:
glibc 2.19 or higher
CUDA 8.0 or higher

Read more at: http://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html#ixzz58NdnOeoU
Follow us: @gpucomputing on Twitter | NVIDIA on Facebook

typhoonzero · 2018-02-28T06:14:46Z

Seems CUDA 8.0 build also fails, important logs, seems ncclGroupStart must use nccl2 header to compile, while manylinux docker image installs nccl1 currently, and since manylinux is based on centos6, nccl2 does not provide yum install package either.

[02:44:17] :	 [Step 4/4] [ 55%] Building NVCC (Device) object paddle/fluid/operators/CMakeFiles/cross_entropy_op.dir/cross_entropy_op_generated_cross_entropy_op.cu.o
[02:44:18]W:	 [Step 4/4] /paddle/paddle/fluid/platform/nccl_test.cu(92): error: no instance of function template "paddle::platform::dynload::DynLoad__ncclGroupStart::operator()" matches the argument list
[02:44:18]W:	 [Step 4/4]             object type is: paddle::platform::dynload::DynLoad__ncclGroupStart
[02:44:18]W:	 [Step 4/4] 
[02:44:18]W:	 [Step 4/4] /paddle/paddle/fluid/platform/nccl_test.cu(101): error: no instance of function template "paddle::platform::dynload::DynLoad__ncclGroupEnd::operator()" matches the argument list
[02:44:18]W:	 [Step 4/4]             object type is: paddle::platform::dynload::DynLoad__ncclGroupEnd

Yancey1989 · 2018-03-01T09:23:00Z

So shall we stop support CUDA 7.5 and above version with wheel package?

luotao1 · 2018-03-01T09:31:06Z

Many servers in company support CUDA 7.5 only.

Yancey1989 · 2018-03-01T11:41:40Z

We have already upgrade nccl to nccl2 since #8540, it means users can't use the latest PaddlePaddle with CUDA 7.5.

luotao1 · 2018-03-01T11:51:44Z

#8504 (comment) @dzhwinter said "To make our Multi-GPU supported in more platform, we still need the nccl1 to compatible with older CUDA version."

typhoonzero assigned Yancey1989 and QiJune Feb 28, 2018

Yancey1989 mentioned this issue Mar 2, 2018

Fix nccl version in manylinux develop Docker image #8708

Merged

Yancey1989 closed this as completed in #8708 Mar 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest daily build error when building nccl test using manylinux docker image #8635

Latest daily build error when building nccl test using manylinux docker image #8635

typhoonzero commented Feb 28, 2018

QiJune commented Feb 28, 2018

typhoonzero commented Feb 28, 2018

Yancey1989 commented Mar 1, 2018

luotao1 commented Mar 1, 2018

Yancey1989 commented Mar 1, 2018

luotao1 commented Mar 1, 2018

Latest daily build error when building nccl test using manylinux docker image #8635

Latest daily build error when building nccl test using manylinux docker image #8635

Comments

typhoonzero commented Feb 28, 2018

QiJune commented Feb 28, 2018

typhoonzero commented Feb 28, 2018

Yancey1989 commented Mar 1, 2018

luotao1 commented Mar 1, 2018

Yancey1989 commented Mar 1, 2018

luotao1 commented Mar 1, 2018