-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance config.EnableMKLDNN api for mkldnn cache clear strategy #18549
Conversation
bool AnalysisPredictor::Run(const std::vector<PaddleTensor> &inputs, | ||
std::vector<PaddleTensor> *output_data, | ||
int batch_size) { | ||
paddle::platform::SetNumThreads(config_.cpu_math_library_num_threads()); | ||
#ifdef PADDLE_WITH_MKLDNN | ||
if (config_.use_mkldnn_) MkldnnPreRun(inputs); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared with #18372, the reason of don't use MkldnnPostRun
is: if reset the mkldnn_session_id
to 0, the unit-test dev_ctx->GetShapeBlobSize()
could not get the correct shape_blob size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a corner case, e.g. if thread is reuse with pool, in last execution, instance X config.mkldnn_input_shape_cache_capacity_ is set >0, then thread A is set thread local cache capacity and this variable is not cleared after execution, but when this thread A is reused by another instance B with config.mkldnn_input_shape_cache_capacity_ = 0, it will hit wrong branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so suggest to do like below in MkldnnPreRun:
if (config_.mkldnn_input_shape_cache_capacity_ > 0) {
VLOG(2) << "In mkldnn cache clear mode.";
platform::set_cur_mkldnn_session_id(
platform::kMKLDNNSessionID_CacheClearing);
platform::set_cur_input_shape_cache_capacity(
config_.mkldnn_input_shape_cache_capacity_);
}
// Set current_input_shape .
std::stringstream ss;
for (size_t i = 0; i < inputs.size(); ++i) {
for (size_t j = 0; j < inputs[i].shape.size(); ++j) {
ss << inputs[i].shape[j] << "-";
}
}
VLOG(2) << "Set input shape=" << ss.str();
platform::set_cur_input_shape_str(ss.str());
@@ -462,7 +462,8 @@ void MKLDNNDeviceContext::SetBlob(const std::string& name, | |||
if (key_it == sBlob->end()) { | |||
// In cache clearing mode, cur_input_shape_cache_capacity defines | |||
// max pblob capacity | |||
if ((sid == kMKLDNNSessionID_CacheClearing) && | |||
if ((static_cast<size_t>(sid) == kMKLDNNSessionID_CacheClearing) && | |||
sBlob->size() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance it for cur_input_shape_cache_capacity=1
and sBlob.size()==0
…#18532) * Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop
test=develop
@LeoZhao-Intel @jczaja Please take a review! |
…ddle into luotao1-enable_mkldnn_enhance
bool AnalysisPredictor::Run(const std::vector<PaddleTensor> &inputs, | ||
std::vector<PaddleTensor> *output_data, | ||
int batch_size) { | ||
paddle::platform::SetNumThreads(config_.cpu_math_library_num_threads()); | ||
#ifdef PADDLE_WITH_MKLDNN | ||
if (config_.use_mkldnn_) MkldnnPreRun(inputs); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a corner case, e.g. if thread is reuse with pool, in last execution, instance X config.mkldnn_input_shape_cache_capacity_ is set >0, then thread A is set thread local cache capacity and this variable is not cleared after execution, but when this thread A is reused by another instance B with config.mkldnn_input_shape_cache_capacity_ = 0, it will hit wrong branch.
paddle/fluid/pybind/inference_api.cc
Outdated
@@ -250,7 +250,8 @@ void BindAnalysisConfig(py::module *m) { | |||
.def("tensorrt_engine_enabled", &AnalysisConfig::tensorrt_engine_enabled) | |||
.def("switch_ir_debug", &AnalysisConfig::SwitchIrDebug, | |||
py::arg("x") = true) | |||
.def("enable_mkldnn", &AnalysisConfig::EnableMKLDNN) | |||
.def("enable_mkldnn", &AnalysisConfig::EnableMKLDNN, | |||
py::arg("mkldnn_input_shape_cache_capacity") = 0) | |||
.def("mkldnn_enabled", &AnalysisConfig::mkldnn_enabled) | |||
.def("set_cpu_math_library_num_threads", | |||
&AnalysisConfig::SetCpuMathLibraryNumThreads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there may be another failure in CI, see in my PR #18081. https://github.com/PaddlePaddle/Paddle/pull/18081/files?file-filters%5B%5D=.py#diff-876ea1bc109973488c161a657f79812fR74 , but it may be fixed in your PR.
bool AnalysisPredictor::Run(const std::vector<PaddleTensor> &inputs, | ||
std::vector<PaddleTensor> *output_data, | ||
int batch_size) { | ||
paddle::platform::SetNumThreads(config_.cpu_math_library_num_threads()); | ||
#ifdef PADDLE_WITH_MKLDNN | ||
if (config_.use_mkldnn_) MkldnnPreRun(inputs); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so suggest to do like below in MkldnnPreRun:
if (config_.mkldnn_input_shape_cache_capacity_ > 0) {
VLOG(2) << "In mkldnn cache clear mode.";
platform::set_cur_mkldnn_session_id(
platform::kMKLDNNSessionID_CacheClearing);
platform::set_cur_input_shape_cache_capacity(
config_.mkldnn_input_shape_cache_capacity_);
}
// Set current_input_shape .
std::stringstream ss;
for (size_t i = 0; i < inputs.size(); ++i) {
for (size_t j = 0; j < inputs[i].shape.size(); ++j) {
ss << inputs[i].shape[j] << "-";
}
}
VLOG(2) << "Set input shape=" << ss.str();
platform::set_cur_input_shape_str(ss.str());
guofei02 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
EnableMKLDNN(int mkldnn_input_shape_cache_capacity = 0)
TEST(Analyzer_MM_DNN, mkldnn_cache_clear)
with the enhancement api, and add output compare between no cache strategy and using cache strategy.