Mkldnn layout #11040

mozga-intel · 2018-05-30T09:13:13Z

This PR contains changes required to support MKLDNN memory layouts improving performance of MKLDNN computations.

This implementation differs from the proposed one in #10291. It is based mainly on Brian Liu implementation, which is simplified version of what I proposed previously.

The latest version of the pull request is splits into a few part of code (information).

This pull request contains:

Implementation of the layout as a supports for mkldnn's operators.

Pull-request is related to:

MKLDNN layout: Support for sum operator MKLDNN layout: Support for sum operator #11102
MKLDNN layout: Support for pool operator MKLDNN layout: Support for pool operator #11101
MKLDNN layout: Support for convolution operator MKLDNN layout: Support for convolution operator #11099
MKLDNN layout: Support for batch norm operator MKLDNN layout: Support for batch norm operator #11098
MKLDNN layout: Support for activation operator MKLDNN layout: Support for activation operator #11124

luotao1 · 2018-05-31T08:55:00Z

paddle/fluid/framework/op_registry.h

+    return 0;                                                             \
+  }
+
+#define REGISTER_OP_KERNEL_WITH_LAYOUT(op_type, LIBRARY_TYPE, LAYOUT,        \


Why do you REGISTER_OP_KERNEL_WITH_LAYOUT? I could not find the usage of it.

According to your previous proposition about splitting the pull-request into a few small parts of the code, this one contains only the changes required to support MKLDNN operators, but it does not include example how this #define is used. Could you have a look at this code (#usage)

luotao1 · 2018-05-31T08:56:11Z

paddle/fluid/framework/op_registry.h

@@ -189,6 +206,15 @@ class OpKernelRegistrar : public Registrar {
      __attribute__((unused)) =                                   \
          TouchOpKernelRegistrar_##op_type##_##LIBRARY_TYPE()

+#define USE_OP_DEVICE_KERNEL_EXTEND(op_type, LIBRARY_TYPE, LAYOUT)           \


Why do you USE_OP_DEVICE_KERNEL_EXTEND ? I could not find the usage of it.

Please have a look at this code

What do you mean by "EXTEND"? Is that specifically for MKLDNN?

Or can you use some existed macro instead?

Do you have any comment about adding #define USE_OP_DEVICE_KERNEL_EXTEND? @jacquesqiao

Yes, that is a specific way for the MKLDNN's. This is the way to tell the MKLDNN's from the CPU's platform. These functions that we see are a little bit different, therefore I can't use any existing functions. It gives us an ability to register the new MKLDNN's operator which has supports of the layout.

After discuss with @tensor-tang , we think that currently all mkldnn kernels have only one layout MKLDNN, so we can use LIBRARY to mark and distinguish them, mkldnn_kernels can be choosed by this library flag, data transform can use the tensor.layout_ to decide if it need to do transform.

It seems that currently all our work can be done without register layout to op_kernel_registry, so can we just use LIBRARY for now and move forward on our work?

@jacquesqiao done.

tensor-tang · 2018-05-31T11:31:18Z

paddle/fluid/framework/tensor.h

-class Tensor {
+class Tensor
+#ifdef PADDLE_WITH_MKLDNN
+    : public MKLDNNTensor


I think it's not a good way let Tensor : public MKLDNNTensor

May be

class Tensor { #ifdef PADDLE_WITH_MKLDNN mkl_fileds; void mkldnn_method(); #endif };

is a better way

@jacquesqiao and @tensor-tang, The inheritance was removed from the code, and the all methods was moved from mkldnn_tensor to tensor file

luotao1 · 2018-06-04T03:30:40Z

paddle/fluid/operators/pool_mkldnn_op.cc

+                           const std::vector<int>& ksize,
+                           const std::vector<int>& strides,
+                           const std::vector<int>& paddings,
+                           const std::string& suffix) {


why do you modify the parameter order of strides and paddings? Could pool_mkldnn_op.cc remain as before?

I modified these line of code, because the cpplint gives the piece of information that the code must be modified. I needn't have to did these changes. Likely that is relate to the compiler version which is used by me.

luotao1 · 2018-06-04T03:34:04Z

paddle/fluid/framework/data_layout_transform.cc

+using mkldnn::primitive;
+using mkldnn::reorder;
+
+void* get_data_from_tensor(const Tensor& tensor,


How about get_data_from_tensor be GetDataFromTensor? Since we use Camel Case in naming.

luotao1 · 2018-06-04T03:34:52Z

paddle/fluid/framework/data_layout_transform.cc

+  }
+}
+
+void TransDataLayoutMkldnn(const OpKernelType& kernel_type_for_var,


How about TransDataLayoutMkldnn be TransDataLayoutMKLDNN? Since @wangkuiyi suggest in #3337 (comment)

luotao1 · 2018-06-04T03:41:29Z

paddle/fluid/framework/data_layout_transform.h

+  }
+}
+
+inline MKLDNNDataType to_mkldnn_data_type(const std::type_index type) {


How about to_mkldnn_data_type be ToMKLDNNDataType? Since we use Camel Case in naming.

tensor-tang · 2018-06-04T03:25:46Z

paddle/fluid/framework/data_layout_transform.cc

+using mkldnn::primitive;
+using mkldnn::reorder;
+
+void* get_data_from_tensor(const Tensor& tensor,


GetDataFromTensor

tensor-tang · 2018-06-04T03:30:59Z

paddle/fluid/framework/data_layout_transform.cc

+
+  PADDLE_ENFORCE(
+      in_layout == DataLayout::kMKLDNN && out_layout != DataLayout::kMKLDNN,
+      "TransDataLayoutMkldnn only supports transfrom from MKLDNN to "


It seems the function name is not appropriate according to this comment.

Done. I changed this name of the function: TransDataLayoutFromMKLDNN

tensor-tang · 2018-06-04T03:43:47Z

paddle/fluid/framework/data_transform.cc

-                          kernel_type_for_var.data_layout_)) {
-    TransDataLayout(kernel_type_for_var, expected_kernel_type, in, &out);
+  if (NeedTransformLayout(lout, lin)) {
+#ifdef PADDLE_WITH_MKLDNN


Is there any better idea do not use #ifdef PADDLE_WITH_MKLDNN since Paddle itself supports kMKLDNN.

As you can see

enum class LibraryType { kPlain = 0, kMKLDNN = 1, kCUDNN = 2, };

That's a temporary solution, why I had to use this #ifdef flag - I'm waiting for mkldnn flag (i.e a global flag). The team of PaddlePaddle working on this problem to make this flag as a global flag - for all mkldnn operators. Please have a look at this issue, where was touched this topic.

Thanks @mozga-intel , I can understand that's a temporary solution. But I am afraid that #10765 is not trying to deal with this one.

Maybe you can directly remove #if here since cudnn do not have global flag as well.

@tensor-tang done.

tensor-tang · 2018-06-04T03:52:01Z

paddle/fluid/framework/op_kernel_type.h

+      (l != DataLayout::kAnyLayout && r != DataLayout::kAnyLayout && l != r);
+#ifdef PADDLE_WITH_MKLDNN
+  // Layout transform needed for either non-MKLDNN to MKLDNN or vice versa
+  ret |= (l != DataLayout::kMKLDNN && r == DataLayout::kMKLDNN);


Any better solution?
It seems have some duplicated logic with

if (NeedTransformLayout(lout, lin)) { #ifdef PADDLE_WITH_MKLDNN if (lin == DataLayout::kMKLDNN || lout == DataLayout::kMKLDNN) { ...

tensor-tang · 2018-06-04T03:56:56Z

paddle/fluid/framework/op_registry.h

@@ -189,6 +206,15 @@ class OpKernelRegistrar : public Registrar {
      __attribute__((unused)) =                                   \
          TouchOpKernelRegistrar_##op_type##_##LIBRARY_TYPE()

+#define USE_OP_DEVICE_KERNEL_EXTEND(op_type, LIBRARY_TYPE, LAYOUT)           \


What do you mean by "EXTEND"? Is that specifically for MKLDNN?

Or can you use some existed macro instead?

Do you have any comment about adding #define USE_OP_DEVICE_KERNEL_EXTEND? @jacquesqiao

luotao1 · 2018-06-06T10:23:13Z

please merge the latest code to pass the test_listen_and_serv_op in teamcity.

Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout can be used in MKLDNN enabled OP kernel. Before this commit, NCHW is hardcode to be used in all MKLDNN op kernels. As a result, non-optimized execution path is selected in MKLDNN primitive which bring worse performance. Besides framework change, three MKLDNN OP kernels were updated for using new MKLDNN layout. They are conv/pool2d/batch_norm. Other MKLDNN OP kernels need be also updated in similar way to achieve best performance.

mozga-intel · 2018-06-07T07:22:36Z

@luotao1 Done.

tensor-tang

Thanks for your great work @mozga-intel , but a little question @luotao1 please help check is that acceptable.

tensor-tang · 2018-06-07T07:53:17Z

paddle/fluid/framework/tensor.h

+  // Fix me: here just change the default layout to kNCHW
+  // it doesn't fix the real issue, i.e. feeder should set up tensor layout
+  // according to actual input data
+  DataLayout layout_ = DataLayout::kNCHW;


Is this acceptable? @luotao1

@mozga-intel @tensor-tang
Discussed with @qingqing01, it's acceptable to modify the default layout.

luotao1

Thanks very much!

mozga-intel · 2018-06-07T08:57:57Z

@luotao1 @tensor-tang @jacquesqiao Thanks very much, great work.

mozga-intel force-pushed the mozga-intel/mkldnn-layout branch from e476d5d to 021d595 Compare May 30, 2018 09:31

luotao1 added the Intel label May 30, 2018

mozga-intel force-pushed the mozga-intel/mkldnn-layout branch from 021d595 to b67821a Compare May 30, 2018 11:03

mozga-intel requested a review from luotao1 May 30, 2018 13:35

luotao1 requested a review from tensor-tang May 30, 2018 13:45

luotao1 reviewed May 31, 2018

View reviewed changes

tensor-tang reviewed May 31, 2018

View reviewed changes

luotao1 reviewed Jun 4, 2018

View reviewed changes

tensor-tang reviewed Jun 4, 2018

View reviewed changes

mozga-intel force-pushed the mozga-intel/mkldnn-layout branch 3 times, most recently from f82e9d1 to ae23bb6 Compare June 6, 2018 09:32

mozga-intel force-pushed the mozga-intel/mkldnn-layout branch from ae23bb6 to 47d6bab Compare June 6, 2018 13:23

jianhang-liu and others added 9 commits June 7, 2018 08:08

Add MKLDNN layout support in activation OP

a1a86a5

Don't populate layout from input to output when kMKLDNN in

a9a1fe6

Refine pool mkldnn op kernel

04339d5

MKLDNN layout

b0b8db6

Remove the inferitance from tensor file

c387fa9

MKLDNN layout: refactoring

3abe446

Remove additional #define to register new operator

dfc6566

Prepare mkldnn tests to work with layout

568aff0

mozga-intel force-pushed the mozga-intel/mkldnn-layout branch from 47d6bab to 568aff0 Compare June 7, 2018 06:12

tensor-tang reviewed Jun 7, 2018

View reviewed changes

luotao1 approved these changes Jun 7, 2018

View reviewed changes

luotao1 merged commit 3ff9ba0 into PaddlePaddle:develop Jun 7, 2018

luotao1 mentioned this pull request Jan 24, 2019

[MKL-DNN] MKL-DNN specific Tensor modification #15429

Merged

Mkldnn layout #11040

Mkldnn layout #11040

Conversation

mozga-intel commented May 30, 2018 • edited Loading

Choose a reason for hiding this comment

mozga-intel May 31, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tensor-tang Jun 4, 2018 • edited Loading

Choose a reason for hiding this comment

mozga-intel Jun 4, 2018 • edited Loading

Choose a reason for hiding this comment

jacquesqiao Jun 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mozga-intel Jun 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mozga-intel Jun 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mozga-intel Jun 4, 2018 • edited Loading

Choose a reason for hiding this comment

tensor-tang Jun 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tensor-tang Jun 4, 2018 • edited Loading

Choose a reason for hiding this comment

luotao1 commented Jun 6, 2018

mozga-intel commented Jun 7, 2018

tensor-tang left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

mozga-intel commented Jun 7, 2018

mozga-intel commented May 30, 2018 •

edited

Loading

mozga-intel May 31, 2018 •

edited

Loading

tensor-tang Jun 4, 2018 •

edited

Loading

mozga-intel Jun 4, 2018 •

edited

Loading

jacquesqiao Jun 5, 2018 •

edited

Loading

mozga-intel Jun 4, 2018 •

edited

Loading

mozga-intel Jun 4, 2018 •

edited

Loading

mozga-intel Jun 4, 2018 •

edited

Loading

tensor-tang Jun 5, 2018 •

edited

Loading

tensor-tang Jun 4, 2018 •

edited

Loading

tensor-tang left a comment •

edited

Loading