don't clear mkldnn cache in block_op executor dtor #25735

sfraczek · 2020-07-27T08:57:28Z

PR types

Bug fixes

PR changes

Others

Describe

MKLDNN cache is removed in Executor's destructor. This should not happen at the end of RunImpl of conditional_block_op, where locally created Executor is destroyed. When working on dygraph resnet enablement of MKLDNN, I found that test_resnet.py with MKLDNN enabled will crash because it cannot find forward MKLDNN primitive in cache.

paddle-bot-old · 2020-07-27T08:57:36Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

grygielski

LGTM

paddle/fluid/operators/distributed_ops/fl_listen_and_serv_op.cc

paddle/fluid/operators/collective/c_gen_nccl_id_op.cc

paddle/fluid/operators/distributed_ops/gen_nccl_id_op.cc

paddle/fluid/operators/distributed_ops/listen_and_serv_op.cc

paddle/fluid/operators/tensorrt/tensorrt_engine_op.h

grygielski · 2020-08-04T11:27:30Z

@luotao1 PR-CI-Coverage fails because of test coverage in fl_listen_and_serve_op and conditional_block_infer_op. However, these lines are not executed in tests, only because test build does not have WITH_MKLDNN flag. Could you advice how we can pass this CI?

luotao1 · 2020-08-05T04:38:43Z

When working on dygraph resnet enablement of MKLDNN, I found that test_resnet.py with MKLDNN enabled will crash because it cannot find forward MKLDNN primitive in cache.

Is this related with don't clear mkldnn cache at the end of RunImpl of conditional_block_op?

MKLDNN cache is removed in Executor's destructor. This should not happen at the end of RunImpl of conditional_block_op, where locally created Executor is destroyed.

Do you have any other method to solve this problem? Since the current method is not grace.

luotao1 · 2020-08-05T04:42:13Z

paddle/fluid/operators/collective/c_gen_nccl_id_op.cc

@@ -99,6 +99,9 @@ class CGenNCCLIdOp : public framework::OperatorBase {

    framework::ProgramDesc empty_program;
    framework::Executor executor(dev_ctx.GetPlace());
+#ifdef PADDLE_WITH_MKLDNN
+    executor.KeepMKLDNNCache(true);
+#endif


c_gen_nccl_id_op.cc is used only in GPU, thus, it doesn't need to be updated.

luotao1 · 2020-08-05T04:47:09Z

paddle/fluid/operators/distributed_ops/gen_nccl_id_op.cc

@@ -214,6 +214,9 @@ class GenNCCLIdOp : public framework::OperatorBase {

    framework::ProgramDesc empty_program;
    framework::Executor executor(dev_ctx.GetPlace());


gen_nccl_id_op.cc is used only in GPU, thus, it doesn't need to be updated.

luotao1 · 2020-08-05T04:47:35Z

paddle/fluid/operators/tensorrt/tensorrt_engine_op.h

@@ -134,6 +134,9 @@ class TensorRTEngineOp : public framework::OperatorBase {
  void RunNativeImpl(const framework::Scope &scope,
                     const platform::Place &dev_place) const {
    framework::Executor executor(dev_place);


tensorrt_engine_op.h is used only in GPU, thus, it doesn't need to be updated.

lidanqing-intel · 2020-08-05T07:30:11Z

Luotao think this is not elegant. Please consider submitting one issue they will change the executor.

grygielski · 2020-08-05T12:07:52Z

@luotao1 I've submitted an Issue about this problem: #25988

lidanqing-intel · 2020-08-07T11:03:07Z

This PR and issue #25988 Luotao said they need to discuss inside team. We wait some time.

lidanqing-intel · 2020-08-19T07:18:09Z

@grygielski This PR seems have compatible issues. Test resnet does not pass on windows but on Linux it can pass. We could ask luotao for help for deploying on Windows and see what is the difference between windows and linux. Why linux pass but windows don't

grygielski · 2020-08-24T07:31:11Z

Closing since #26502 is already merged

$sfraczek$

$@sfraczek$

don't clear mkldnn cache in block_op executor dtor

43be44c

$@sfraczek$ sfraczek requested a review from grygielski July 27, 2020 08:57

$@sfraczek$ sfraczek added the Intel label Jul 27, 2020

grygielski previously approved these changes Jul 27, 2020

View reviewed changes

$@sfraczek$ sfraczek requested review from wozna, wojtuss and grygielski July 27, 2020 09:05

$@sfraczek$

review fixes

1e3e531

$@sfraczek$ sfraczek dismissed grygielski’s stale review via 1e3e531 July 27, 2020 16:49

$@sfraczek$

Merge branch 'develop' into block-op-executor-dnnl-cache

a4f4ed2

$@sfraczek$ sfraczek added the dygraph issues related to dygraph mode label Jul 28, 2020

wojtuss reviewed Jul 29, 2020

View reviewed changes

$@sfraczek$

review fix exec->executor

f31bb2d

$@sfraczek$ sfraczek mentioned this pull request Jul 31, 2020

Verification of dygraph MKLDNN accuracy convergence #25872

Closed

luotao1 reviewed Aug 5, 2020

View reviewed changes

grygielski mentioned this pull request Aug 6, 2020

Conditional clearing of MKL-DNN cache in Executor's destructor #25988

Closed

luotao1 requested a review from zhhsplendid August 11, 2020 09:14

grygielski closed this Aug 24, 2020

$@sfraczek$ sfraczek deleted the block-op-executor-dnnl-cache branch September 3, 2020 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't clear mkldnn cache in block_op executor dtor #25735

don't clear mkldnn cache in block_op executor dtor #25735

$@sfraczek$ sfraczek commented Jul 27, 2020 •

edited

Loading

paddle-bot-old bot commented Jul 27, 2020

grygielski left a comment

grygielski commented Aug 4, 2020 •

edited

Loading

luotao1 commented Aug 5, 2020 •

edited

Loading

luotao1 Aug 5, 2020

luotao1 Aug 5, 2020

luotao1 Aug 5, 2020

lidanqing-intel commented Aug 5, 2020

grygielski commented Aug 5, 2020

lidanqing-intel commented Aug 7, 2020

lidanqing-intel commented Aug 19, 2020 •

edited

Loading

grygielski commented Aug 24, 2020

		@@ -214,6 +214,9 @@ class GenNCCLIdOp : public framework::OperatorBase {

		framework::ProgramDesc empty_program;
		framework::Executor executor(dev_ctx.GetPlace());

don't clear mkldnn cache in block_op executor dtor #25735

don't clear mkldnn cache in block_op executor dtor #25735

Conversation

sfraczek commented Jul 27, 2020 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Jul 27, 2020

grygielski left a comment

Choose a reason for hiding this comment

grygielski commented Aug 4, 2020 • edited Loading

luotao1 commented Aug 5, 2020 • edited Loading

luotao1 Aug 5, 2020

Choose a reason for hiding this comment

luotao1 Aug 5, 2020

Choose a reason for hiding this comment

luotao1 Aug 5, 2020

Choose a reason for hiding this comment

lidanqing-intel commented Aug 5, 2020

grygielski commented Aug 5, 2020

lidanqing-intel commented Aug 7, 2020

lidanqing-intel commented Aug 19, 2020 • edited Loading

grygielski commented Aug 24, 2020

$@sfraczek$ sfraczek commented Jul 27, 2020 •

edited

Loading

grygielski commented Aug 4, 2020 •

edited

Loading

luotao1 commented Aug 5, 2020 •

edited

Loading

lidanqing-intel commented Aug 19, 2020 •

edited

Loading