nccl.h not found when compiling from source #8504

yu239-zz · 2018-02-23T00:11:08Z

See the reopened #5035.

helinwang · 2018-02-23T00:16:54Z

CC: @tonyyang-svail @dzhwinter could you guys take a look, we probably should support compiling without docker.

helinwang · 2018-02-23T19:11:12Z

Hi @yu239 , we switched to nccl2, please install the dependency using sudo apt-get install libnccl2 libnccl-dev (more info is here: http://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html).

nccl2 is closed source comparing to nccl1, so we can not use cmake to download the source and compile. Maybe manually install it using apt-get is the best solution.

typhoonzero · 2018-02-24T01:36:34Z

@helinwang Is it necessary to add a build option to switch off nccl dependency?

dzhwinter · 2018-02-24T02:24:35Z

The nccl2 is not open sourced anymore, NVIDIA provide the cuda docker image with ppa(the apt source url) included inside, so we can make a apt install command. @helinwang
For build paddle out of docker, you need to install nccl2 like cudnn manually. Here is the nccl2 download link https://developer.nvidia.com/nccl

luotao1 · 2018-02-24T02:42:16Z

Should we remain nccl1?

If we don't remain nccl1, can we remove cmake/external/nccl.cmake, which will make users confused?
I find that nccl2 needs cuda 8.0: http://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html#softreq

If we don't remain nccl1, how about the pip install version cuda7.5_cudnn5_avx_mkl?

wangkuiyi · 2018-02-25T01:24:01Z

Does our codebase (develop branch) still depend on NCCL 1? If not, let us remove nccl.cmake. @luotao1

luotao1 · 2018-02-25T04:23:19Z

Our codebase still works well with NCCL 1, and @dzhwinter will update the nccl.cmake later to get the compatibility of the NCCL 1 and 2.

dzhwinter · 2018-02-25T07:44:49Z

In our codebase, we provide NCCL as a DSO(Dynamic Shared Library) library. It means that we only use a nccl.h to compile, no more static library is depended.

According to the NCCL installl guide https://docs.nvidia.com/deeplearning/sdk/pdf/NCCL-Installation-Guide.pdf , we have the dependency relation below.

nccl2.1.4(latest) -> cuda9.0 or higher
nccl2.1.2 -> cuda8.0
nccl1.x -> cuda7.0 or higher

To make our Multi-GPU supported in more platform, we still need the nccl1 to compatible with older CUDA version.

yu239-zz assigned yu239-zz and helinwang Feb 23, 2018

luotao1 mentioned this issue Mar 1, 2018

Latest daily build error when building nccl test using manylinux docker image #8635

Closed

luotao1 mentioned this issue Apr 11, 2018

remove unused nccl.cmake #9833

Merged

luotao1 closed this as completed in #9833 Apr 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nccl.h not found when compiling from source #8504

nccl.h not found when compiling from source #8504

yu239-zz commented Feb 23, 2018 •

edited

Loading

helinwang commented Feb 23, 2018

helinwang commented Feb 23, 2018 •

edited

Loading

typhoonzero commented Feb 24, 2018

dzhwinter commented Feb 24, 2018 •

edited

Loading

luotao1 commented Feb 24, 2018 •

edited

Loading

wangkuiyi commented Feb 25, 2018 •

edited by luotao1

Loading

luotao1 commented Feb 25, 2018

dzhwinter commented Feb 25, 2018

nccl.h not found when compiling from source #8504

nccl.h not found when compiling from source #8504

Comments

yu239-zz commented Feb 23, 2018 • edited Loading

helinwang commented Feb 23, 2018

helinwang commented Feb 23, 2018 • edited Loading

typhoonzero commented Feb 24, 2018

dzhwinter commented Feb 24, 2018 • edited Loading

luotao1 commented Feb 24, 2018 • edited Loading

wangkuiyi commented Feb 25, 2018 • edited by luotao1 Loading

luotao1 commented Feb 25, 2018

dzhwinter commented Feb 25, 2018

yu239-zz commented Feb 23, 2018 •

edited

Loading

helinwang commented Feb 23, 2018 •

edited

Loading

dzhwinter commented Feb 24, 2018 •

edited

Loading

luotao1 commented Feb 24, 2018 •

edited

Loading

wangkuiyi commented Feb 25, 2018 •

edited by luotao1

Loading