Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use -rpath to fix libmklml_intel.so not found #11806

Merged
merged 10 commits into from
Jul 5, 2018
Merged

use -rpath to fix libmklml_intel.so not found #11806

merged 10 commits into from
Jul 5, 2018

Conversation

luotao1
Copy link
Contributor

@luotao1 luotao1 commented Jun 28, 2018

@luotao1 luotao1 requested a review from tensor-tang June 28, 2018 07:20
@@ -0,0 +1,15 @@
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要加这个文件呢?我看numpy的lib下面只有so.

$ pwd
xxx/python2.7/site-packages/numpy/.libs
$ ll
-rwxrwxr-x 1 tangjian tangjian  1023960 Jun 15 01:01 libgfortran-ed201abd.so.3.0.0
-rwxrwxr-x 1 tangjian tangjian 38513408 Jun 15 01:01 libopenblasp-r0-39a31c03.2.18.so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一开始如果没有__init__.py的话,whl打包的时候,会找不到这个目录。不过可以尝试在打包后删除这个文件。

# The reason is that libwarpctc.so, libiomp5.so etc are in paddle.libs, and
# core.so is in paddle.fluid, thus paddle/fluid/../libs will pointer to above libraries.
# This operation will fix https://github.com/PaddlePaddle/Paddle/issues/3213
os.environ['core_rpath']=str(os.popen("patchelf --print-rpath \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要用environ?直接存一个string可以吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

存一个string的写法一直没有搞定,所以换成environ了。

# This operation will fix https://github.com/PaddlePaddle/Paddle/issues/3213
os.environ['core_rpath']=str(os.popen("patchelf --print-rpath \
${PADDLE_BINARY_DIR}/python/paddle/fluid/core.so").read().strip('\n'))
os.environ['core_rpath']=os.environ['core_rpath']+":'$ORIGIN/../libs/'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实我觉得是不是别的路径都不需要,只要ORIGIN那个就好了,因为他的路径都是编译时就有的。安装之后很大可能也用不到。

Library runpath: [/home/tangjian/.jumbo/opt/gcc48/lib64:/home/tangjian/.jumbo/lib:$ORIGIN/../libs/]
ldd core.so
libiomp5.so => /usr/local/lib/python2.7/dist-packages/paddle/fluid/./../libs/libiomp5.so (0x00007ff7b16c6000)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有道理。

@luotao1
Copy link
Contributor Author

luotao1 commented Jul 4, 2018

The rpath is successful: https://paddleci.ngrok.io/viewLog.html?buildId=982&buildTypeId=Paddle_PrCi&tab=buildLog&_focus=10506

[12:51:48]	 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN/../libs/]

The ldd is successful as well: https://paddleci.ngrok.io/viewLog.html?buildId=982&buildTypeId=Paddle_PrCi&tab=buildLog&_focus=10534

[12:51:48]		libiomp5.so => /paddle/build/.check_api_workspace/../python/paddle/fluid/../libs/libiomp5.so (0x00007f3cbf9fb000)
[12:51:48]		libmkldnn.so.0 => /paddle/build/.check_api_workspace/../python/paddle/fluid/../libs/libmkldnn.so.0 (0x00007f3cbf2fa000)

But the unittest fails: https://paddleci.ngrok.io/viewLog.html?buildId=982&buildTypeId=Paddle_PrCi&tab=buildLog&_focus=10593

[12:53:10]	  File "/paddle/tools/print_signatures.py", line 64, in <module>
[12:53:10]	    visit_all_module(importlib.import_module(sys.argv[1]))
[12:53:10]	  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
[12:53:10]	    __import__(name)
[12:53:10]	  File "/paddle/build/.check_api_workspace/.env/local/lib/python2.7/site-packages/paddle/fluid/__init__.py", line 17, in <module>
[12:53:10]	    import framework
[12:53:10]	  File "/paddle/build/.check_api_workspace/.env/local/lib/python2.7/site-packages/paddle/fluid/framework.py", line 29, in <module>
[12:53:10]	    directory. The original error is: \n""" + e.message)
[12:53:10]	ImportError: NOTE: You may need to run "export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH"
[12:53:10]	    if you encounters "libmkldnn.so not found" errors. If you have python
[12:53:10]	    installed in other directory, replace "/usr/local/lib" with your own
[12:53:10]	    directory. The original error is: 
[12:53:10]	libmklml_intel.so: cannot open shared object file: No such file or directory

The unittest is in

function assert_api_not_changed() {
mkdir -p ${PADDLE_ROOT}/build/.check_api_workspace
cd ${PADDLE_ROOT}/build/.check_api_workspace
virtualenv .env
source .env/bin/activate
pip install ${PADDLE_ROOT}/build/python/dist/*whl
curl ${PADDLE_API_SPEC_URL:-https://raw.githubusercontent.com/reyoung/FluidAPISpec/master/API.spec} \
> origin.spec
python ${PADDLE_ROOT}/tools/print_signatures.py paddle.fluid > new.spec
python ${PADDLE_ROOT}/tools/diff_api.py origin.spec new.spec
deactivate
}

@luotao1
Copy link
Contributor Author

luotao1 commented Jul 4, 2018

I reproduce the error, the reason is that when -DWITH_MKLDNN=ON, libmkldnn.so.0 will not find the libmklml_intel.so and libiomp5.so

λ 95529df39701 /Paddle/docker_build {mklml_rpath} ldd /Paddle/docker_build/python/paddle/fluid/../libs/libmkldnn.so.0
	linux-vdso.so.1 =>  (0x00007ffd441c3000)
	libmklml_intel.so => not found
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff0a75ea000)
	libiomp5.so => not found
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff0a7267000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff0a6f5e000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff0a6d48000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff0a697d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ff0a7fa0000)

# The reason is that all thirdparty libraries in the same directory,
# thus, libmkldnn.so.0 will find libmklml_intel.so and libiomp5.so.
command = "patchelf --set-rpath '$ORIGIN/' ${MKLDNN_SHARED_LIB}"
os.system(command)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need to check the return value?

# core.so is in paddle.fluid, thus paddle/fluid/../libs will pointer to above libraries.
# This operation will fix https://github.com/PaddlePaddle/Paddle/issues/3213
command = "patchelf --set-rpath '$ORIGIN/../libs/' ${PADDLE_BINARY_DIR}/python/paddle/fluid/core.so"
os.system(command)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, need some checks.

@luotao1
Copy link
Contributor Author

luotao1 commented Jul 5, 2018

fails at https://travis-ci.org/PaddlePaddle/Paddle/jobs/400331461#L2880

sh: 1: patchelf: not found
Traceback (most recent call last):
  File "setup.py", line 147, in <module>
    raise Exception("patchelf --set-rpath for core.so fails")
Exception: patchelf --set-rpath for core.so fails

The reason is that Status: Downloaded newer image for paddlepaddle/paddle:latest-dev doesn't update the latest-dev.
https://travis-ci.org/PaddlePaddle/Paddle/jobs/400331461#L595

@typhoonzero Could I add the check next PR?

@luotao1
Copy link
Contributor Author

luotao1 commented Jul 5, 2018

@typhoonzero I remove the check this PR, and after latest-dev image updated (install patchelf) nightly, I will add the check in next PR.

Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM++

@tensor-tang tensor-tang merged commit 7a735f2 into PaddlePaddle:develop Jul 5, 2018
@luotao1 luotao1 deleted the mklml_rpath branch July 6, 2018 01:46
@luotao1
Copy link
Contributor Author

luotao1 commented Jul 9, 2018

comments by @ @jianhang-liu

一直觉得用patchelf手动修改RPATH不漂亮,但之前修改CMake文件一直未能成功改变RPATH。做了点调查,现在清楚为什么了。

先说结论:

  1. 目前用patchelf手动修改是最合理的选择
  2. 修改CMake文件(增加CMAKE_INSTALL_RPATH/…等相关cmake变量)未能生效(core.so/libpaddle_fluid.so的ELF里RPATH未被改变),原因是:这些target (core.so, libpaddle_fluid.so)都不是install target 即CMake文件里没有install(xxx),并非通过make install安装的)。相反,这些target只是在make时编译好,并直接打包到.whl包里。CMake修改安装文件的RPATH是发生在make install里的(可搜索所有的cmake_install.cmake, 看里面的RPATH_CHANGE)

如果是正常安装(而非打包到.whl里),则正确的修改应如下:

SET(CMAKE_SKIP_BUILD_RPATH  FALSE)
SET(CMAKE_BUILD_WITH_INSTALL_RPATH FALSE)
SET(CMAKE_INSTALL_RPATH "$ORIGIN/../lib")
SET(CMAKE_INSTALL_RPATH_USE_LINK_PATH FALSE)

以上修改可保证在build tree或安装位置都能通过RPATH找到库,并且在把整个安装目录移到其它位置时仍能通过RPATH找到库。

有兴趣的话,可参见这篇文档,写的很透彻。

@helinwang
Copy link
Contributor

helinwang commented Jul 25, 2018

pip install paddlepaddle似乎仍然有这个问题。我们是不是需要更新下pypi上的版本?(最新版是Jul 3, 2018)

ImportError: libmkldnn.so.0: cannot open shared object file: No such file or directory

@typhoonzero
Copy link
Contributor

rpath的修改应该是在这个版本之后了。pypi不支持覆盖所以得在下一个版本发布了。或者发布一个bug fix 版本0.14.1

kuke pushed a commit to kuke/Paddle that referenced this pull request Aug 25, 2018
* use -rpath to fix libmklml_intel.so not found

* only remain $ORIGIN/../libs in rpath of core.so

* test

* test

* add rpath of libmkldnn.so.0

* check return value of os.system

* remove check return value of patchelf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pip install error in official docker image The whl package downloaded from "Install using pip" can not run
4 participants