Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable dynamic load mklml lib on fluid #11596

Merged
merged 6 commits into from
Jun 26, 2018

Conversation

tensor-tang
Copy link
Contributor

@tensor-tang tensor-tang commented Jun 20, 2018

Thanks @jianhang-liu for your proposal. develop...jianhang-liu:dynamic_load_mklml

I refine it
and this should also fix #11452, since /usr/local/lib has been removed.

And we should also add mkldnn and iomp later.

  • fluid
    • mklml
    • iomp
    • mkldnn

@tensor-tang
Copy link
Contributor Author

tensor-tang commented Jun 20, 2018

[Step 1/1] The following tests FAILED:
[08:55:25][Step 1/1] 239 - test_parallel_executor_mnist (Failed)
[08:55:25][Step 1/1] 241 - test_parallel_executor_test_while_train (Failed)

Will rerun later.

@tensor-tang
Copy link
Contributor Author

Next step would focus on

 py -c "import paddle.fluid"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/__init__.py", line 17, in <module>
    import framework
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 22, in <module>
    from . import core
ImportError: /usr/local/lib/python2.7/dist-packages/paddle/fluid/core.so: undefined symbol: GOMP_parallel_start

@tensor-tang
Copy link
Contributor Author

After link as needed, gomp has been removed.

ldd build/python/paddle/fluid/core.so
linux-vdso.so.1 => (0x00007ffd4b3b0000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb3e0941000)
libpython2.7.so.1.0 => /home/tangjian/.jumbo/lib/libpython2.7.so.1.0 (0x00007fb3e0583000)
librt.so.1 => /lib64/librt.so.1 (0x00007fb3e037b000)
libiomp5.so => /home/tangjian/.jumbo/lib/libiomp5.so (0x00007fb3dffd6000)
libstdc++.so.6 => /home/tangjian/.jumbo/opt/gcc48/lib64/libstdc++.so.6 (0x00007fb3dfcd5000)
libm.so.6 => /lib64/libm.so.6 (0x00007fb3dfa51000)
libgcc_s.so.1 => /home/tangjian/.jumbo/opt/gcc48/lib64/libgcc_s.so.1 (0x00007fb3df83a000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb3df4a8000)
/lib64/ld-linux-x86-64.so.2 (0x000000318a600000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fb3df2a4000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007fb3df0a0000)

@luotao1
Copy link
Contributor

luotao1 commented Jun 22, 2018

images: latest-dev
After I pip install the whl:

λ 5699a4228da1 /Paddle {for_test} python
Python 2.7.12 (default, Dec  4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/__init__.py", line 17, in <module>
    import framework
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py", line 22, in <module>
    from . import core
ImportError: libiomp5.so: cannot open shared object file: No such file or directory
>>>

And after I export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

Python 2.7.12 (default, Dec  4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
*** Aborted at 1529669343 (unix time) try "date -d @1529669343" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGBUS (@0x7f7c502157e0) received by PID 240 (TID 0x7f7c87df2700) from PID 1344362464; stack trace: ***
    @     0x7f7c879cf390 (unknown)
    @     0x7f7c87bf8c94 (unknown)
    @     0x7f7c87be2168 (unknown)
    @     0x7f7c87be3c27 (unknown)
    @     0x7f7c87bf0577 (unknown)
    @     0x7f7c87beb564 (unknown)
    @     0x7f7c87befda9 (unknown)
    @     0x7f7c873f0f09 (unknown)
    @     0x7f7c87beb564 (unknown)
    @     0x7f7c873f1571 (unknown)
    @     0x7f7c873f0fa1 dlopen
    @     0x7f7c52572ad9 paddle::platform::dynload::GetMKLMLDsoHandle()
    @     0x7f7c51761e79 _ZSt16__once_call_implISt12_Bind_simpleIFZN6paddle8platform7dynload28DynLoad__MKL_Set_Num_ThreadsclIJiEEEDTcl19MKL_Set_Num_Threadsspfp_EEDpT_EUlvE_vEEEvv
    @     0x7f7c879cca99 __pthread_once_slow
    @     0x7f7c517616a5 paddle::framework::InitDevices()
    @     0x7f7c517619a9 paddle::framework::InitDevices()
    @     0x7f7c5167acaa _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL13pybind11_initEvEUlbE62_vIbEINS_4nameENS_5scopeENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNESL_
    @     0x7f7c516a703a pybind11::cpp_function::dispatcher()
    @           0x4bc3fa PyEval_EvalFrameEx
    @           0x4b9ab6 PyEval_EvalCodeEx
    @           0x4c1e6f PyEval_EvalFrameEx
    @           0x4b9ab6 PyEval_EvalCodeEx
    @           0x4b97a6 PyEval_EvalCode
    @           0x4b96df PyImport_ExecCodeModuleEx
    @           0x4b2b06 (unknown)
    @           0x4b402c (unknown)
    @           0x4a4ae1 (unknown)
    @           0x4a4513 PyImport_ImportModuleLevel
    @           0x4a59e4 (unknown)
    @           0x4a577e PyObject_Call
    @           0x4c5e10 PyEval_CallObjectWithKeywords
    @           0x4be6d7 PyEval_EvalFrameEx
Bus error

@tensor-tang
Copy link
Contributor Author

tensor-tang commented Jun 25, 2018

Thanks~

ImportError: libiomp5.so: cannot open shared object file: No such file or directory

This PR does not fix this issue, it only fix the version issue of mklml.

As for the iomp, it should use LD to solve.

Bus error

I can not reproduce it on latest images, but I will try on latest-dev later.

@tensor-tang
Copy link
Contributor Author

I have tried on latest-dev, it works fine.

λ 7a134d7495fe /home export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
λ 7a134d7495fe /home python
Python 2.7.12 (default, Dec  4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle.fluid
>>>

Double check the core.so:

 ldd /usr/local/lib/python2.7/dist-packages/paddle/fluid/core.so
	linux-vdso.so.1 =>  (0x00007fff7b763000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9b6c339000)
	libpython2.7.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 (0x00007f9b6bdab000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f9b6bba3000)
	libiomp5.so => /usr/local/lib/libiomp5.so (0x00007f9b6b7fe000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f9b6b47c000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f9b6b173000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f9b6af5c000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9b6ab92000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f9b70cb1000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f9b6a978000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9b6a773000)
	libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f9b6a570000)

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除build目录后,重新编译是OK的。

@luotao1 luotao1 merged commit 2dae8a4 into PaddlePaddle:develop Jun 26, 2018
@tensor-tang tensor-tang deleted the refine/mklml/dyload branch June 26, 2018 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pip install error in official docker image
2 participants