Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying multiple models at the same time will raise MKLDNN error #31992

Closed
juncaipeng opened this issue Mar 31, 2021 · 12 comments
Closed

Deploying multiple models at the same time will raise MKLDNN error #31992

juncaipeng opened this issue Mar 31, 2021 · 12 comments
Assignees
Labels

Comments

@juncaipeng
Copy link
Contributor

juncaipeng commented Mar 31, 2021

  • Info

C++ API
Ubuntu 16.04
CPU MKLDNN
GCC 8.2.0

  • Prepare

Please contact danqing to download the demo. The demo only has two group models for test.

Download model_test.cc.zip, unzip model_test.cc.zip, use the new model_test.cc to update the old model_test.cc file in the demo.

Make and install paddle release2.0 (commit id: c7a6a1f9610a9ee018c19d89950d76b38f33aed1).

cmake -DCMAKE_BUILD_TYPE=Release -DWITH_PYTHON=OFF -DWITH_MKL=ON -DWITH_GPU=OFF -DON_INFER=ON .. && make -j && make inference_lib_dist -j

Set LIB_DIR as the path of PaddleInference in build.sh.

Run sh build.sh.

Run ulimit -c unlimited enable save core file.

  • The first test

Run ./build/model_test --test_groups=0 --single_instance=true , it does not raise error.
If set single_instance as true, every model only has one predictor. Otherwise, some models will have several predictors by calling predictor.clone().

Run ./build/model_test --test_groups=0 --single_instance=false, it raises segmentation fault error.
Run gdb ./build/model_test core_file get the following error.

image

  • The second test

Run ./build/model_test --test_groups=1 --single_instance=true , it does not raise error.

Run ./build/model_test --test_groups=1 --single_instance=false, it also raises segmentation fault error.

image

Sometimes, the above demo raises different error, such as

image

  • The third test

Run ./build/model_test --test_groups="4 5 6" --single_instance=true. The demo loads several group models and every model has one predictor, and it also raises error as following. The demo only has two group models for now, we will provide other models later.

image

image
image

@paddle-bot-old
Copy link

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@jczaja
Copy link
Contributor

jczaja commented Apr 7, 2021

@lidanqing-intel , @juncaipeng We are investigating that issue. Currently We reproduced it on develop branch. Candidate fix was made (#32136) and we are testing it now.

@jczaja
Copy link
Contributor

jczaja commented Apr 8, 2021

@juncaipeng

  1. Could you please test this PR [oneDNN] Candidate fix to #31992 #32136 (develop) if this solves problem for you?
  2. We miss some multi-instance unit tests (Improve Multi-threading, multi-instance UT #32087) , would it be possible to turn this issue's test into UT ?

@juncaipeng
Copy link
Contributor Author

juncaipeng commented Apr 8, 2021

@jczaja
For the latest develop branch, the demo still raises error when run the following command:
./build/model_test --test_groups="0 1" --single_instance=false
Sometimes, the demo also raises error when run the following command:
./build/model_test --test_groups="0" --single_instance=false
./build/model_test --test_groups="1" --single_instance=false

Should I use the inference library of release/2.0?

These models can not be used in UT, so you should find some other models.

jczaja added a commit to jczaja/Paddle that referenced this issue Apr 8, 2021
luotao1 pushed a commit that referenced this issue Apr 9, 2021
@jczaja
Copy link
Contributor

jczaja commented Apr 9, 2021

@juncaipeng This is cherry-pick for release/2.0 #32163 . It works fine on my setup but @lidanqing-intel that every other run there is some crash on her setup so I will test it further.

Superjomn pushed a commit that referenced this issue Apr 13, 2021
* - Candidate fix to #31992

- Fix to #31992 for 2.0
@jczaja
Copy link
Contributor

jczaja commented Apr 21, 2021

@juncaipeng I have made some more changes (develop PR: #32309). Could you please test them and report problems if any?

@juncaipeng
Copy link
Contributor Author

@jczaja I have tested all the models in the demo and don't have problems. The customer will use the new inference library to test in their project. If there's any news, I'll give it back.

@juncaipeng
Copy link
Contributor Author

@jczaja The customers reported that there is no problem with the new inference library (develop PR: #32309) for now.

@jczaja
Copy link
Contributor

jczaja commented Apr 26, 2021

@juncaipeng I have implemented alternative fix : #32499 . That is the one I would like to merge. Could you please test it against this issue

@juncaipeng
Copy link
Contributor Author

@jczaja The inference library (develop PR: #32499) also passed all tests.

@lidanqing-intel
Copy link
Contributor

@juncaipeng Could this issue be closed

@paddle-bot-old
Copy link

Are you satisfied with the resolution of your issue?

YES
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants