-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't clear mkldnn cache in block_op executor dtor #25735
don't clear mkldnn cache in block_op executor dtor #25735
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
paddle/fluid/operators/distributed_ops/fl_listen_and_serv_op.cc
Outdated
Show resolved
Hide resolved
@luotao1 PR-CI-Coverage fails because of test coverage in |
Is this related with don't clear mkldnn cache at the end of RunImpl of conditional_block_op?
Do you have any other method to solve this problem? Since the current method is not grace. |
@@ -99,6 +99,9 @@ class CGenNCCLIdOp : public framework::OperatorBase { | |||
|
|||
framework::ProgramDesc empty_program; | |||
framework::Executor executor(dev_ctx.GetPlace()); | |||
#ifdef PADDLE_WITH_MKLDNN | |||
executor.KeepMKLDNNCache(true); | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c_gen_nccl_id_op.cc is used only in GPU, thus, it doesn't need to be updated.
@@ -214,6 +214,9 @@ class GenNCCLIdOp : public framework::OperatorBase { | |||
|
|||
framework::ProgramDesc empty_program; | |||
framework::Executor executor(dev_ctx.GetPlace()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gen_nccl_id_op.cc is used only in GPU, thus, it doesn't need to be updated.
@@ -134,6 +134,9 @@ class TensorRTEngineOp : public framework::OperatorBase { | |||
void RunNativeImpl(const framework::Scope &scope, | |||
const platform::Place &dev_place) const { | |||
framework::Executor executor(dev_place); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensorrt_engine_op.h is used only in GPU, thus, it doesn't need to be updated.
Luotao think this is not elegant. Please consider submitting one issue they will change the executor. |
This PR and issue #25988 Luotao said they need to discuss inside team. We wait some time. |
@grygielski This PR seems have compatible issues. Test resnet does not pass on windows but on Linux it can pass. We could ask luotao for help for deploying on Windows and see what is the difference between windows and linux. Why linux pass but windows don't |
Closing since #26502 is already merged |
PR types
Bug fixes
PR changes
Others
Describe
MKLDNN cache is removed in Executor's destructor. This should not happen at the end of RunImpl of conditional_block_op, where locally created Executor is destroyed. When working on dygraph resnet enablement of MKLDNN, I found that
test_resnet.py
with MKLDNN enabled will crash because it cannot find forward MKLDNN primitive in cache.