Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable Intel MKLDNN library and Intel MKL small package #2940

Merged
merged 12 commits into from
Jul 21, 2017

Conversation

tensor-tang
Copy link
Contributor

Enable Intel MKLDNN library, compile with an cmake option WITH_MKLDNN.

And enable Intel MKL small package when WITH_MKLDNN.

CMakeLists.txt Outdated
@@ -37,6 +37,7 @@ include(simd)
################################ Configurations #######################################
option(WITH_GPU "Compile PaddlePaddle with NVIDIA GPU" ${CUDA_FOUND})
option(WITH_AVX "Compile PaddlePaddle with AVX intrinsics" ${AVX_FOUND})
option(WITH_MKLDNN "Compile PaddlePaddle with mkl-dnn support." ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default OFF is better, not all machines have MKL environment(Android + ARM).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xreki @hedaoyuan I think I can use ${AVX_FOUND} instead.

Copy link
Contributor

@Xreki Xreki Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It maybe the simplest way to enable the test of mkldnn related PRs in Travis CI and TeamCity.

GIT_REPOSITORY "https://github.com/01org/mkl-dnn.git"
GIT_TAG "${MKLDNN_TAG}"
PREFIX ${MKLDNN_SOURCES_DIR}
PATCH_COMMAND cd <SOURCE_DIR>/scripts && ./prepare_mkl.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my server, I met the following when downloading mklml_lnx_2018.0.20170425.tgz. I see it succeeded in Travis CI. I'm not sure why.

$ wget -P ../external/ https://github.com/01org/mkl-dnn/releases/download/v0.9/mklml_lnx_2018.0.20170425.tgz
--2017-07-18 10:20:14--  https://github.com/01org/mkl-dnn/releases/download/v0.9/mklml_lnx_2018.0.20170425.tgz
Resolving agent.baidu.com... 172.22.15.16, 172.22.15.17
Connecting to agent.baidu.com|172.22.15.16|:8128... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/58414589/54f3715e-3c9a-11e7-97f0-b402ecc00a80?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20170718%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170718T102015Z&X-Amz-Expires=300&X-Amz-Signature=54ec18736c48ae3fcf97dea15db01075d8ef50171c54ab396325494eaba57a9f&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dmklml_lnx_2018.0.20170425.tgz&response-content-type=application%2Foctet-stream [following]
--2017-07-18 10:20:15--  https://github-production-release-asset-2e65be.s3.amazonaws.com/58414589/54f3715e-3c9a-11e7-97f0-b402ecc00a80?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20170718%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170718T102015Z&X-Amz-Expires=300&X-Amz-Signature=54ec18736c48ae3fcf97dea15db01075d8ef50171c54ab396325494eaba57a9f&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dmklml_lnx_2018.0.20170425.tgz&response-content-type=application%2Foctet-stream
Connecting to agent.baidu.com|172.22.15.16|:8128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 67379285 (64M) [application/octet-stream]
../external/54f3715e-3c9a-11e7-97f0-b402ecc00a80?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20170718%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170718T102015Z&X-Amz-Expires=300&X-Amz-Signature=54ec18736c48ae3fcf97dea15db01075d8ef50171c54ab396325494eaba57a9f&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment; filename=mklml_lnx_2018.0.20170425.tgz&response-content-type=application%2Foctet-stream: File name too long

Cannot write to `../external/54f3715e-3c9a-11e7-97f0-b402ecc00a80?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20170718%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170718T102015Z&X-Amz-Expires=300&X-Amz-Signature=54ec18736c48ae3fcf97dea15db01075d8ef50171c54ab396325494eaba57a9f&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment; filename=mklml_lnx_2018.0.20170425.tgz&response-content-type=application%2Foctet-stream' (No such file or directory).

However, I can successfully download using the following command:

$ wget -O ../external/mklml_lnx_2018.0.20170425.tgz https://github.com/01org/mkl-dnn/releases/download/v0.9/mklml_lnx_2018.0.20170425.tgz 
--2017-07-18 10:20:45--  https://github.com/01org/mkl-dnn/releases/download/v0.9/mklml_lnx_2018.0.20170425.tgz
Resolving agent.baidu.com... 172.22.15.17, 172.22.15.16
Connecting to agent.baidu.com|172.22.15.17|:8128... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/58414589/54f3715e-3c9a-11e7-97f0-b402ecc00a80?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20170718%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170718T102046Z&X-Amz-Expires=300&X-Amz-Signature=32ff43f10569d40b3fb22e5237de554764056e53b7397393aeb993bf92754f9d&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dmklml_lnx_2018.0.20170425.tgz&response-content-type=application%2Foctet-stream [following]
--2017-07-18 10:20:46--  https://github-production-release-asset-2e65be.s3.amazonaws.com/58414589/54f3715e-3c9a-11e7-97f0-b402ecc00a80?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20170718%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170718T102046Z&X-Amz-Expires=300&X-Amz-Signature=32ff43f10569d40b3fb22e5237de554764056e53b7397393aeb993bf92754f9d&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dmklml_lnx_2018.0.20170425.tgz&response-content-type=application%2Foctet-stream
Connecting to agent.baidu.com|172.22.15.17|:8128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 67379285 (64M) [application/octet-stream]
Saving to: `../external/mklml_lnx_2018.0.20170425.tgz'

100%[===================================================================================================================================================>] 67,379,285  1.03M/s   in 67s     

2017-07-18 10:21:54 (977 KB/s) - `../external/mklml_lnx_2018.0.20170425.tgz' saved [67379285/67379285]

Besides, I suggest adding another cmake file, such as cmake/external/mkl-lite.cmake, and using another external command to download mklml_lnx_2018.0.20170425.tgz, as it can be accessed independently. @hedaoyuan @luotao1 , what's your opinion?

Copy link
Contributor

@hedaoyuan hedaoyuan Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding another cmake file, such as cmake/external/mkl-lite.cmake

I agree with this point. MKLDNN and MKL_LITE related macros are defined in a file, that looks a bit confusing.

Copy link
Contributor Author

@tensor-tang tensor-tang Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, MKL_LITE(optional) and OpenMP(required) would be needed by MKLDNN to get better performance. So we always download MKL_LITE before compiling MKLDNN.

The request can be done~ Update later.

About the issue of downloading, run as root?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still can not reproduce the downloading issue, Both commands can work on my system.
Can you try the downloading not in paddle?
And how about removing the dst folder wget -P . https://github.com/01org/mkl-dnn/releases/download/v0.9/mklml_lnx_2018.0.20170425.tgz?

Copy link
Contributor

@Xreki Xreki Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how about removing the dst folder wget -P . https://github.com/01org/mkl-dnn/releases/download/v0.9/mklml_lnx_2018.0.20170425.tgz?

The same error. And I am not running as root. I think it is related to the server's environment. Overall, I suggest using wget with -O option to rename it.

Since we will use ExternalProject_Add instead, we may temporarily skip this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The similar error when I sh prepare_mkl.sh:

[luotao02@yq01-idl-gpu-jpaas-let01: scripts] (tags/v0.9) -> $ outnet_exec sh prepare_mkl.sh 
--2017-07-19 18:37:52--  https://github.com/01org/mkl-dnn/releases/download/v0.9/mklml_lnx_2018.0.20170425.tgz
Resolving agent.baidu.com... 172.22.15.16, 172.22.15.17
Connecting to agent.baidu.com|172.22.15.16|:8118... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/58414589/54f3715e-3c9a-11e7-97f0-b402ecc00a80?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20170719%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170719T103753Z&X-Amz-Expires=300&X-Amz-Signature=7a6e4b2a2bf0f14ff8916d77c9244e67c339220e21e9970d0f0cb20af016f3d5&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dmklml_lnx_2018.0.20170425.tgz&response-content-type=application%2Foctet-stream [following]
--2017-07-19 18:37:53--  https://github-production-release-asset-2e65be.s3.amazonaws.com/58414589/54f3715e-3c9a-11e7-97f0-b402ecc00a80?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20170719%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20170719T103753Z&X-Amz-Expires=300&X-Amz-Signature=7a6e4b2a2bf0f14ff8916d77c9244e67c339220e21e9970d0f0cb20af016f3d5&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dmklml_lnx_2018.0.20170425.tgz&response-content-type=application%2Foctet-stream
Connecting to agent.baidu.com|172.22.15.16|:8118... connected.
Proxy tunneling failed: ForbiddenUnable to establish SSL connection.
tar (child): /home/luotao02/Paddle/third_party/mkldnn/src/extern_mkldnn/external/mklml_lnx*.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
Downloaded and unpacked Intel(R) MKL small libraries to /home/luotao02/Paddle/third_party/mkldnn/src/extern_mkldnn/external

As the scripts not return False when download fails, thus make continuous, and will encounter:

In file included from /home/luotao02/Paddle/paddle/function/GemmFunctor.h:17:0,
                 from /home/luotao02/Paddle/paddle/function/GemmConvOp.cpp:16:
/home/luotao02/Paddle/paddle/math/MathFunctions.h:19:23: fatal error: mkl_cblas.h: No such file or directory
 #include <mkl_cblas.h>
                       ^
compilation terminated.

if(WITH_MKLDNN)
add_definitions(-DPADDLE_USE_MKLDNN)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I printed out the values of ${OpenMP_C_FLAGS} and ${OpenMP_CXX_FLAGS}, and found there are None. Maybe find_package(OpenMP REQUIRED) is needed.

Copy link
Contributor Author

@tensor-tang tensor-tang Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the reason is that find_package(OpenMP) should be called before using OpenMP_CXX_FLAGS and OpenMP_C_FLAGS

CMAKE_ARGS -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
CMAKE_ARGS -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
CMAKE_ARGS -DCMAKE_CXX_FLAGS=${MKLDNN_CMAKE_CXX_FLAGS}
CMAKE_ARGS -DCMAKE_C_FLAGS=${MKLDNN_CMAKE_C_FLAGS}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see kernels implemented with different simd intrinsics in the source codes of mkl-dnn. I just want to know how mkl-dnn switch. Is it determined in compiling time or runtime? Just remind here that when setting WITH_AVX=ON, -mavx will be delivered to mkl-dnn through CMAKE_C_FLAGS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime choosing by mkldnn. It would choose the best instructions on local machine.
Yes, I saw the WITH_AVX flag, it wouldn't bother.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime choosing by mkldnn.

I see. Good.

when setting WITH_AVX=ON, -mavx will be delivered to mkl-dnn through CMAKE_C_FLAGS.

I must correct this. Because in native compiling case, the CMAKE_C_FLAGS will not be set until include(configure). Thus, it is None here.

CMakeLists.txt Outdated
@@ -136,6 +138,11 @@ if(WITH_GPU)
endif(NOT WITH_DSO)
endif(WITH_GPU)

if(WITH_MKLDNN)
message(STATUS "MKLDNN_LIBRARY: ${MKLDNN_LIBRARY}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message can be moved to cmake/external/mkldnn.cmake, to make the main CMakeLists.txt as concise as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Overall, can you compile with mkldnn successfully?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can compile mkldnn. And after I manually download mklml_lnx_2018.0.20170425.tgz, I can compile Paddle with mkldnn too.

@Xreki
Copy link
Contributor

Xreki commented Jul 19, 2017

@hedaoyuan, as the default value of WITH_MKLDNN is changed to OFF, do we need to change the cmake command of Travis and teamcity to enable the test of mkldnn?

@tensor-tang
Copy link
Contributor Author

Updated~
Please have a check @Xreki @hedaoyuan @reyoung @luotao1

${EXTERNAL_PROJECT_LOG_ARGS}
PREFIX ${MKL_LITE_DOWNLOAD_DIR}
DOWNLOAD_DIR ${MKL_LITE_DOWNLOAD_DIR}
DOWNLOAD_COMMAND wget --no-check-certificate ${MKL_LITE_URL}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not GIT_REPOSITORY and GIT_TAG?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance,

ExternalProject_Add(
    zlib
    ${EXTERNAL_PROJECT_LOG_ARGS}
    GIT_REPOSITORY  "https://github.com/madler/zlib.git"
    GIT_TAG         "v1.2.8"
    PREFIX          ${ZLIB_SOURCES_DIR}
    UPDATE_COMMAND  ""
    CMAKE_ARGS      -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
    CMAKE_ARGS      -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
    CMAKE_ARGS      -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
    CMAKE_ARGS      -DCMAKE_C_FLAGS=${CMAKE_C_FLAGS}
    CMAKE_ARGS      -DCMAKE_INSTALL_PREFIX=${ZLIB_INSTALL_DIR}
    CMAKE_ARGS      -DBUILD_SHARED_LIBS=OFF
    CMAKE_ARGS      -DCMAKE_POSITION_INDEPENDENT_CODE=ON
    CMAKE_ARGS      -DCMAKE_MACOSX_RPATH=ON
    CMAKE_ARGS      -DCMAKE_BUILD_TYPE=Release
    CMAKE_CACHE_ARGS -DCMAKE_INSTALL_PREFIX:PATH=${ZLIB_INSTALL_DIR}
                     -DCMAKE_POSITION_INDEPENDENT_CODE:BOOL=ON
                     -DCMAKE_BUILD_TYPE:STRING=Release
)

hard-code might not works in our internal system

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a git, just a tar package. GIT_REPOSITORY and GIT_TAG not applicable here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I found swig used this way too

ExternalProject_Add(swig
GIT_REPOSITORY https://github.com/swig/swig.git
GIT_TAG rel-3.0.10
PREFIX ${SWIG_SOURCES_DIR}
CONFIGURE_COMMAND cd <SOURCE_DIR> && ./autogen.sh && ./configure
--prefix=${SWIG_INSTALL_DIR} --without-pcre
BUILD_COMMAND cd <SOURCE_DIR> && make
INSTALL_COMMAND cd <SOURCE_DIR> && make install
UPDATE_COMMAND ""
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gangliao I tried to use the following command instead and met some error. It is the same as #2666, because the downloading source is a https source, and the pre-built cmake use their built-in libcurl which does not support https. To enable https, we need to build a new cmake from source.

ExternalProject_Add(
    ${MKL_LITE_PROJECT}
    ${EXTERNAL_PROJECT_LOG_ARGS}
    URL                   ${MKL_LITE_URL}
    PREFIX                ${MKL_LITE_DOWNLOAD_DIR}
    UPDATE_COMMAND        ""
    CONFIGURE_COMMAND     ""
    BUILD_COMMAND         ""
    INSTALL_COMMAND       ""
    TEST_COMMAND          ""
)

However, @tensor-tang the DOWNLOAD_COMMAND need to be changed to wget --no-check-certificate -O ${MKL_LITE_DOWNLOAD_DIR}/${MKL_LITE_VER}.tgz ${MKL_LITE_URL}, or I'll meet the same error I commented before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tensor-tang swig uses autoconf tools instead of cmake, so we need to specify the CONFIGURE_COMMAND, BUILD_COMMAND and INSTALL_COMMAND. For cmake project, such as gflags, it can be simplified.

GIT_REPOSITORY "https://github.com/01org/mkl-dnn.git"
GIT_TAG "v0.9"
PREFIX ${MKLDNN_SOURCES_DIR}
CONFIGURE_COMMAND mkdir -p <SOURCE_DIR>/build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you specify mkdir, cmake and make install? In ExternalProject_Add, its BUILD_COMMAND and INSTALL_COMMAND are real CMake commands. You don't need to hard-code.

For instance

ExternalProject_Add(
    zlib
    ${EXTERNAL_PROJECT_LOG_ARGS}
    GIT_REPOSITORY  "https://github.com/madler/zlib.git"
    GIT_TAG         "v1.2.8"
    PREFIX          ${ZLIB_SOURCES_DIR}
    UPDATE_COMMAND  ""
    CMAKE_ARGS      -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
    CMAKE_ARGS      -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
    CMAKE_ARGS      -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
    CMAKE_ARGS      -DCMAKE_C_FLAGS=${CMAKE_C_FLAGS}
    CMAKE_ARGS      -DCMAKE_INSTALL_PREFIX=${ZLIB_INSTALL_DIR}
    CMAKE_ARGS      -DBUILD_SHARED_LIBS=OFF
    CMAKE_ARGS      -DCMAKE_POSITION_INDEPENDENT_CODE=ON
    CMAKE_ARGS      -DCMAKE_MACOSX_RPATH=ON
    CMAKE_ARGS      -DCMAKE_BUILD_TYPE=Release
    CMAKE_CACHE_ARGS -DCMAKE_INSTALL_PREFIX:PATH=${ZLIB_INSTALL_DIR}
                     -DCMAKE_POSITION_INDEPENDENT_CODE:BOOL=ON
                     -DCMAKE_BUILD_TYPE:STRING=Release
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time used the way you prefer, but since MKL_LITE has been separated from it, it's prefer to let MKLDNN to choose what env it needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @gangliao , the ExternalProject_Add can be simplified because there are some default values and default command to do the building process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tensor-tang it's ok to use the way you like, as long as Travis CI and TeamCity can successfully build. Thanks for your work.

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check why Travis CI failed.

ExternalProject_Add(
extern_gtest
${EXTERNAL_PROJECT_LOG_ARGS}
DEPENDS ${GTEST_DEPENDS}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the build of gtest depend on mkl-lite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really depends, but if not wait for the download will get error can not find liomp5.

SET(MKLDNN_DEPENDS ${MKL_LITE_PROJECT})
SET(MKLDNN_MKLROOT ${MKL_LITE_ROOT})
SET(MKLDNN_IOMP_DIR ${MKL_LITE_LIB_DIR})
ENDIF()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no mkl-lite, should we abort the building process of mkl-dnn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not bother by mkl_lite if WITH_MKL_LITE=OFF

GIT_REPOSITORY "https://github.com/01org/mkl-dnn.git"
GIT_TAG "v0.9"
PREFIX ${MKLDNN_SOURCES_DIR}
CONFIGURE_COMMAND mkdir -p <SOURCE_DIR>/build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @gangliao , the ExternalProject_Add can be simplified because there are some default values and default command to do the building process.

${EXTERNAL_PROJECT_LOG_ARGS}
PREFIX ${MKL_LITE_DOWNLOAD_DIR}
DOWNLOAD_DIR ${MKL_LITE_DOWNLOAD_DIR}
DOWNLOAD_COMMAND wget --no-check-certificate ${MKL_LITE_URL}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gangliao I tried to use the following command instead and met some error. It is the same as #2666, because the downloading source is a https source, and the pre-built cmake use their built-in libcurl which does not support https. To enable https, we need to build a new cmake from source.

ExternalProject_Add(
    ${MKL_LITE_PROJECT}
    ${EXTERNAL_PROJECT_LOG_ARGS}
    URL                   ${MKL_LITE_URL}
    PREFIX                ${MKL_LITE_DOWNLOAD_DIR}
    UPDATE_COMMAND        ""
    CONFIGURE_COMMAND     ""
    BUILD_COMMAND         ""
    INSTALL_COMMAND       ""
    TEST_COMMAND          ""
)

However, @tensor-tang the DOWNLOAD_COMMAND need to be changed to wget --no-check-certificate -O ${MKL_LITE_DOWNLOAD_DIR}/${MKL_LITE_VER}.tgz ${MKL_LITE_URL}, or I'll meet the same error I commented before.

IF (${CMAKE_VERSION} VERSION_LESS "3.3.0")
SET(dummyfile ${CMAKE_CURRENT_BINARY_DIR}/mkllite_dummy.c)
FILE(WRITE ${dummyfile} "const char * dummy_mkllite = \"${dummyfile}\";")
ADD_LIBRARY(mkllite STATIC ${dummyfile})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dummyfile is needed for those header-only libraries, libany, eigen and pybind11. Since mkl-lite is a library, it should be treated as an imported library.

set(CMAKE_C_CREATE_SHARED_LIBRARY_FORBIDDEN_FLAGS ${OPENMP_FLAGS})
set(CMAKE_CXX_CREATE_SHARED_LIBRARY_FORBIDDEN_FLAGS ${OPENMP_FLAGS})
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -L${MKLDNN_IOMP_DIR} -liomp5 -Wl,--as-needed")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -L${MKLDNN_IOMP_DIR} -liomp5 -Wl,--as-needed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When compiling Paddle with make -j12, I met the following error:

/home/liuyiqun01/.jumbo/opt/binutils/bin/ld: cannot find -liomp5
collect2: error: ld returned 1 exit status
make[2]: *** [paddle/pybind/libpaddle_pybind.so] Error 1
make[1]: *** [paddle/pybind/CMakeFiles/paddle_pybind.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
100%[===========================================================================================>] 67,379,285  1.02M/s   in 67s     

2017-07-20 02:58:31 (984 KB/s) - `/home/liuyiqun01/github/Paddle/build/third_party/mkllite/mklml_lnx_2018.0.20170425.tgz' saved [67379285/67379285]

[ 26%] No update step for 'extern_mkllite'
[ 26%] No patch step for 'extern_mkllite'
[ 26%] No configure step for 'extern_mkllite'
[ 26%] No build step for 'extern_mkllite'
[ 27%] No install step for 'extern_mkllite'
[ 28%] No test step for 'extern_mkllite'
[ 28%] Completed 'extern_mkllite'
[ 28%] Built target extern_mkllite
make: *** [all] Error 2

Copy link
Contributor Author

@tensor-tang tensor-tang Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did you set make -j12 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same error, but I use make. I can't find liomp5 too.

Copy link
Contributor Author

@tensor-tang tensor-tang Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give the details error message? @luotao1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tensor-tang I think we need use make -j12 to speed up the building process.

message(STATUS "Found cblas and lapack in MKL Lite "
"(include: ${MKL_LITE_INC_DIR}, library: ${CBLAS_LIBRARIES})")
return()
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 19 - 31 is duplicated with line 46 - 58. Besides, I think the setting of CBLAS_* can be moved to cmake/external/mkllite.cmake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed, I found that, thanks.
By the way, I will rename MKL_LITE to MKLML later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is OK. Please be free to change anything you think reasonable. 😊

Copy link
Contributor Author

@tensor-tang tensor-tang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change name MKL_LITE to MKLML and use wget -O instead.
Also remove the duplicate code in cblas.

And about the failing in Travis CI, it's failed in build_doc, any suggestions?

CONFIGURE_COMMAND mkdir -p <SOURCE_DIR>/build
BUILD_COMMAND cd <SOURCE_DIR>/build
&& cmake .. -DCMAKE_INSTALL_PREFIX=${MKLDNN_INSTALL_DIR} -DMKLROOT=${MKLDNN_MKLROOT}
&& make all -j${CPU_CORES}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not clear with your point about make j12, since I have used make all -j${CPU_CORES} @Xreki

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... I see, what is the total cores number in you system? more than 12 right?

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Thanks for @tensor-tang 's work.

@luotao1
Copy link
Contributor

luotao1 commented Jul 20, 2017

travis-ci的build_docs.sh过不了,原因是生成出来的_swig_paddle.so找不到mkl相关的so:

[luotao02@yq01-idl-gpu-jpaas-let01: py_paddle] develop -> $ ldd _swig_paddle.so 
        linux-vdso.so.1 =>  (0x00007ffd079fa000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fcd597b8000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fcd595b4000)
        libmklml_intel.so => not found
        libpython2.7.so.1.0 => /home/luotao02/.jumbo/lib/libpython2.7.so.1.0 (0x00007fcd591f5000)
        libmkldnn.so.0 => not found
        libiomp5.so => not found

build_docs.sh#L14rm -rf *会把生成的动态库全部删掉,导致_swig_paddle.so找不到了。

@Xreki 的建议是在build_docs.sh#L16

cmake .. -DCMAKE_BUILD_TYPE=Debug -DWITH_GPU=OFF -DWITH_DOC=ON

改成

cmake .. -DCMAKE_BUILD_TYPE=Debug -DWITH_GPU=OFF -DWITH_DOC=ON -DWITH_MKLML=OFF -DWITH_MKLDNN=OFF

让这个PR先过,再下一个PR中再解决?

@hedaoyuan @gangliao @reyoung 有什么更好的办法么?

@tensor-tang
Copy link
Contributor Author

Give another solution to bypass the issue.
Seems no better way (I tried many ways: paddle/setup.py.in, paddle/api/CMakeLists.txt, etc.), in case of finding a way to link these *.so into _swig_paddle.so.

By the way, since it is a rpath, I think it's reasonable for me yet.
I just got that when you use opt/intel/mkl to build paddle, you would still get this issue if rm mkl.
Only if cblas is link as static may not have this issue. Right?

Feel free to correct me if I am wrong.

@tensor-tang
Copy link
Contributor Author

tensor-tang commented Jul 20, 2017

[ 8%] Completed 'zlib'
[ 8%] Built target zlib
make: *** [all] Error 2
The command "timeout 2580 paddle/scripts/travis/${JOB}.sh # 43min timeout
RESULT=$?; if [ $RESULT -eq 0 ] || [ $RESULT -eq 142 ]; then true; else false; fi;
" exited with 1.

Still failed.. ,this issue? Any suggestion?

Pass on my local machine.

@Xreki
Copy link
Contributor

Xreki commented Jul 21, 2017

@tensor-tang Parallel compiling is used in Travis CI. I found the real error:

mkdir: cannot create directory ‘/usr/local/opt’: Permission denied
make[2]: *** [third_party/mklml/src/extern_mklml-stamp/extern_mklml-download] Error 1
make[1]: *** [CMakeFiles/extern_mklml.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Do you try to install mkldnn to /usr/local/opt? It needs sudo permission.

@gangliao
Copy link
Contributor

@tensor-tang Can we submit/repair code in this PR?

@Xreki
Copy link
Contributor

Xreki commented Jul 21, 2017

@gangliao we haven't found a way to solve this problem. Thus we decided to set the default value of WITH_MKLDNN and WITH_MKLML to OFF and merge the PR first. Do you have any suggestions about the shared library problem?

@tensor-tang
Copy link
Contributor Author

tensor-tang commented Jul 21, 2017

still failed...

CMake Error at cmake_install.cmake:36 (file):
file INSTALL cannot make directory

SET(MKLDNN_INSTALL_ROOT "$ENV{HOME}")
ENDIF()

SET(MKLDNN_INSTALL_DIR "${MKLDNN_INSTALL_ROOT}/opt/paddle/third_party/mkldnn")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure whether it is good to install to ${HOME}. Users may feel confused about the directory.

Copy link
Contributor Author

@tensor-tang tensor-tang Jul 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can get your point~
But as those *.so needed should be placed in the appropriate folder under permission, ${HOME} is the way I found. If we have better solution, we can change it.
And refer to gpu( /usr/lib) and mkl(opt/intel/mkl), their so files are all in special folders that would not be removed easily.

@luotao1 luotao1 merged commit ef28f66 into PaddlePaddle:develop Jul 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants