Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance config.EnableMKLDNN api for mkldnn cache clear strategy #18549

Closed
wants to merge 6 commits into from
Closed

enhance config.EnableMKLDNN api for mkldnn cache clear strategy #18549

wants to merge 6 commits into from

Conversation

luotao1
Copy link
Contributor

@luotao1 luotao1 commented Jul 8, 2019

  1. enhance config.EnableMKLDNN api for mkldnn cache clear strategy: EnableMKLDNN(int mkldnn_input_shape_cache_capacity = 0)
  2. simplify the TEST(Analyzer_MM_DNN, mkldnn_cache_clear) with the enhancement api, and add output compare between no cache strategy and using cache strategy.

bool AnalysisPredictor::Run(const std::vector<PaddleTensor> &inputs,
std::vector<PaddleTensor> *output_data,
int batch_size) {
paddle::platform::SetNumThreads(config_.cpu_math_library_num_threads());
#ifdef PADDLE_WITH_MKLDNN
if (config_.use_mkldnn_) MkldnnPreRun(inputs);
#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared with #18372, the reason of don't use MkldnnPostRun is: if reset the mkldnn_session_id to 0, the unit-test dev_ctx->GetShapeBlobSize() could not get the correct shape_blob size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a corner case, e.g. if thread is reuse with pool, in last execution, instance X config.mkldnn_input_shape_cache_capacity_ is set >0, then thread A is set thread local cache capacity and this variable is not cleared after execution, but when this thread A is reused by another instance B with config.mkldnn_input_shape_cache_capacity_ = 0, it will hit wrong branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so suggest to do like below in MkldnnPreRun:

  if (config_.mkldnn_input_shape_cache_capacity_ > 0) {
    VLOG(2) << "In mkldnn cache clear mode.";
    platform::set_cur_mkldnn_session_id(
        platform::kMKLDNNSessionID_CacheClearing);
    platform::set_cur_input_shape_cache_capacity(
        config_.mkldnn_input_shape_cache_capacity_);
  }

  // Set current_input_shape .
  std::stringstream ss;
  for (size_t i = 0; i < inputs.size(); ++i) {
    for (size_t j = 0; j < inputs[i].shape.size(); ++j) {
      ss << inputs[i].shape[j] << "-";
    }
  }
  VLOG(2) << "Set input shape=" << ss.str();
  platform::set_cur_input_shape_str(ss.str());

@@ -462,7 +462,8 @@ void MKLDNNDeviceContext::SetBlob(const std::string& name,
if (key_it == sBlob->end()) {
// In cache clearing mode, cur_input_shape_cache_capacity defines
// max pblob capacity
if ((sid == kMKLDNNSessionID_CacheClearing) &&
if ((static_cast<size_t>(sid) == kMKLDNNSessionID_CacheClearing) &&
sBlob->size() &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhance it for cur_input_shape_cache_capacity=1 and sBlob.size()==0

luotao1 and others added 3 commits July 8, 2019 17:47
…#18532)

* Fix Mask rcnn predictor
    1. refine memory optim algorithm to support the model with the block op.
    2. output diff : modify the affine channel fuse
    3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop

* add the missing files.
test=develop
@luotao1
Copy link
Contributor Author

luotao1 commented Jul 8, 2019

@LeoZhao-Intel @jczaja Please take a review!

bool AnalysisPredictor::Run(const std::vector<PaddleTensor> &inputs,
std::vector<PaddleTensor> *output_data,
int batch_size) {
paddle::platform::SetNumThreads(config_.cpu_math_library_num_threads());
#ifdef PADDLE_WITH_MKLDNN
if (config_.use_mkldnn_) MkldnnPreRun(inputs);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a corner case, e.g. if thread is reuse with pool, in last execution, instance X config.mkldnn_input_shape_cache_capacity_ is set >0, then thread A is set thread local cache capacity and this variable is not cleared after execution, but when this thread A is reused by another instance B with config.mkldnn_input_shape_cache_capacity_ = 0, it will hit wrong branch.

@@ -250,7 +250,8 @@ void BindAnalysisConfig(py::module *m) {
.def("tensorrt_engine_enabled", &AnalysisConfig::tensorrt_engine_enabled)
.def("switch_ir_debug", &AnalysisConfig::SwitchIrDebug,
py::arg("x") = true)
.def("enable_mkldnn", &AnalysisConfig::EnableMKLDNN)
.def("enable_mkldnn", &AnalysisConfig::EnableMKLDNN,
py::arg("mkldnn_input_shape_cache_capacity") = 0)
.def("mkldnn_enabled", &AnalysisConfig::mkldnn_enabled)
.def("set_cpu_math_library_num_threads",
&AnalysisConfig::SetCpuMathLibraryNumThreads)
Copy link
Contributor

@LeoZhao-Intel LeoZhao-Intel Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be another failure in CI, see in my PR #18081. https://github.com/PaddlePaddle/Paddle/pull/18081/files?file-filters%5B%5D=.py#diff-876ea1bc109973488c161a657f79812fR74 , but it may be fixed in your PR.

bool AnalysisPredictor::Run(const std::vector<PaddleTensor> &inputs,
std::vector<PaddleTensor> *output_data,
int batch_size) {
paddle::platform::SetNumThreads(config_.cpu_math_library_num_threads());
#ifdef PADDLE_WITH_MKLDNN
if (config_.use_mkldnn_) MkldnnPreRun(inputs);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so suggest to do like below in MkldnnPreRun:

  if (config_.mkldnn_input_shape_cache_capacity_ > 0) {
    VLOG(2) << "In mkldnn cache clear mode.";
    platform::set_cur_mkldnn_session_id(
        platform::kMKLDNNSessionID_CacheClearing);
    platform::set_cur_input_shape_cache_capacity(
        config_.mkldnn_input_shape_cache_capacity_);
  }

  // Set current_input_shape .
  std::stringstream ss;
  for (size_t i = 0; i < inputs.size(); ++i) {
    for (size_t j = 0; j < inputs[i].shape.size(); ++j) {
      ss << inputs[i].shape[j] << "-";
    }
  }
  VLOG(2) << "Set input shape=" << ss.str();
  platform::set_cur_input_shape_str(ss.str());

@luotao1 luotao1 closed this Jul 10, 2019
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ NHZlX
✅ luotao1
❌ guofei02


guofei02 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants