Deploying multiple models at the same time will raise MKLDNN error #31992

juncaipeng · 2021-03-31T11:05:57Z

Info

C++ API
Ubuntu 16.04
CPU MKLDNN
GCC 8.2.0

Prepare

Please contact danqing to download the demo. The demo only has two group models for test.

Download model_test.cc.zip, unzip model_test.cc.zip, use the new model_test.cc to update the old model_test.cc file in the demo.

Make and install paddle release2.0 (commit id: c7a6a1f9610a9ee018c19d89950d76b38f33aed1).

cmake -DCMAKE_BUILD_TYPE=Release -DWITH_PYTHON=OFF -DWITH_MKL=ON -DWITH_GPU=OFF -DON_INFER=ON .. && make -j && make inference_lib_dist -j

Set LIB_DIR as the path of PaddleInference in build.sh.

Run sh build.sh.

Run ulimit -c unlimited enable save core file.

The first test

Run ./build/model_test --test_groups=0 --single_instance=true , it does not raise error.
If set single_instance as true, every model only has one predictor. Otherwise, some models will have several predictors by calling predictor.clone().

Run ./build/model_test --test_groups=0 --single_instance=false, it raises segmentation fault error.
Run gdb ./build/model_test core_file get the following error.

The second test

Run ./build/model_test --test_groups=1 --single_instance=true , it does not raise error.

Run ./build/model_test --test_groups=1 --single_instance=false, it also raises segmentation fault error.

Sometimes, the above demo raises different error, such as

The third test

Run ./build/model_test --test_groups="4 5 6" --single_instance=true. The demo loads several group models and every model has one predictor, and it also raises error as following. The demo only has two group models for now, we will provide other models later.

The text was updated successfully, but these errors were encountered:

paddle-bot-old · 2021-03-31T11:06:00Z

您好，我们已经收到了您的问题，会安排技术人员尽快解答您的问题，请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时，您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快～

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API，FAQ，Github Issue and AI community to get the answer.Have a nice day!

- Fix to PaddlePaddle#31992

jczaja · 2021-04-07T16:44:35Z

@lidanqing-intel , @juncaipeng We are investigating that issue. Currently We reproduced it on develop branch. Candidate fix was made (#32136) and we are testing it now.

jczaja · 2021-04-08T13:11:49Z

@juncaipeng

Could you please test this PR [oneDNN] Candidate fix to #31992 #32136 (develop) if this solves problem for you?
We miss some multi-instance unit tests (Improve Multi-threading, multi-instance UT #32087) , would it be possible to turn this issue's test into UT ?

juncaipeng · 2021-04-08T13:38:59Z

@jczaja
For the latest develop branch, the demo still raises error when run the following command:
./build/model_test --test_groups="0 1" --single_instance=false
Sometimes, the demo also raises error when run the following command:
./build/model_test --test_groups="0" --single_instance=false
./build/model_test --test_groups="1" --single_instance=false

Should I use the inference library of release/2.0?

These models can not be used in UT, so you should find some other models.

- Fix to PaddlePaddle#31992 for 2.0

jczaja · 2021-04-09T08:59:34Z

@juncaipeng This is cherry-pick for release/2.0 #32163 . It works fine on my setup but @lidanqing-intel that every other run there is some crash on her setup so I will test it further.

* - Candidate fix to #31992 - Fix to #31992 for 2.0

jczaja · 2021-04-21T17:46:00Z

@juncaipeng I have made some more changes (develop PR: #32309). Could you please test them and report problems if any?

juncaipeng · 2021-04-25T02:51:53Z

@jczaja I have tested all the models in the demo and don't have problems. The customer will use the new inference library to test in their project. If there's any news, I'll give it back.

juncaipeng · 2021-04-26T02:28:50Z

@jczaja The customers reported that there is no problem with the new inference library (develop PR: #32309) for now.

jczaja · 2021-04-26T08:43:09Z

@juncaipeng I have implemented alternative fix : #32499 . That is the one I would like to merge. Could you please test it against this issue

juncaipeng · 2021-04-27T03:50:08Z

@jczaja The inference library (develop PR: #32499) also passed all tests.

lidanqing-intel · 2021-05-11T06:59:02Z

@juncaipeng Could this issue be closed

paddle-bot-old · 2021-05-12T03:36:09Z

Are you satisfied with the resolution of your issue?

YES
No

paddle-bot-old bot assigned chenjiaoAngel Mar 31, 2021

juncaipeng assigned lidanqing-intel and unassigned chenjiaoAngel Mar 31, 2021

juncaipeng added the Intel label Mar 31, 2021

lidanqing-intel assigned jczaja Apr 6, 2021

jczaja added a commit to jczaja/Paddle that referenced this issue Apr 7, 2021

- Candidate fix to PaddlePaddle#31992

ca0d6ac

- Fix to PaddlePaddle#31992

jczaja mentioned this issue Apr 7, 2021

[oneDNN] Candidate fix to #31992 #32136

Merged

jczaja added a commit to jczaja/Paddle that referenced this issue Apr 8, 2021

- Candidate fix to PaddlePaddle#31992

1a9ad1c

- Fix to PaddlePaddle#31992 for 2.0

jczaja mentioned this issue Apr 8, 2021

[cherry-pick] Fix to 31992 for 2.0 #32163

Merged

luotao1 pushed a commit that referenced this issue Apr 9, 2021

Candidate fix to #31992 (#32136)

dabaca0

Superjomn pushed a commit that referenced this issue Apr 13, 2021

[cherry-pick] Fix to 31992 for 2.0 (#32163)

6791126

* - Candidate fix to #31992 - Fix to #31992 for 2.0

jczaja mentioned this issue Apr 15, 2021

[oneDNN] Made cache to be stored in TLS for prediction #32309

Closed

jczaja mentioned this issue Apr 23, 2021

[oneDNN] Added clearing oneDNN cache per executor #32499

Merged

jczaja mentioned this issue Apr 28, 2021

[cherry-pick to 2.1][Second fix to #31992] #32664

Merged

lidanqing-intel closed this as completed May 12, 2021

OliverLPH mentioned this issue Aug 2, 2021

Deploying models with multi-thread will raise MKLDNN error #34554

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploying multiple models at the same time will raise MKLDNN error #31992

Deploying multiple models at the same time will raise MKLDNN error #31992

juncaipeng commented Mar 31, 2021 •

edited

Loading

paddle-bot-old bot commented Mar 31, 2021

jczaja commented Apr 7, 2021

jczaja commented Apr 8, 2021

juncaipeng commented Apr 8, 2021 •

edited

Loading

jczaja commented Apr 9, 2021

jczaja commented Apr 21, 2021

juncaipeng commented Apr 25, 2021

juncaipeng commented Apr 26, 2021

jczaja commented Apr 26, 2021

juncaipeng commented Apr 27, 2021

lidanqing-intel commented May 11, 2021

paddle-bot-old bot commented May 12, 2021

Deploying multiple models at the same time will raise MKLDNN error #31992

Deploying multiple models at the same time will raise MKLDNN error #31992

Comments

juncaipeng commented Mar 31, 2021 • edited Loading

paddle-bot-old bot commented Mar 31, 2021

jczaja commented Apr 7, 2021

jczaja commented Apr 8, 2021

juncaipeng commented Apr 8, 2021 • edited Loading

jczaja commented Apr 9, 2021

jczaja commented Apr 21, 2021

juncaipeng commented Apr 25, 2021

juncaipeng commented Apr 26, 2021

jczaja commented Apr 26, 2021

juncaipeng commented Apr 27, 2021

lidanqing-intel commented May 11, 2021

paddle-bot-old bot commented May 12, 2021

juncaipeng commented Mar 31, 2021 •

edited

Loading

juncaipeng commented Apr 8, 2021 •

edited

Loading