enable MKL Packed Recurrent Layer #6719

tensor-tang · 2017-12-18T13:51:09Z

luotao1

从代码实现看，MKLPackedRecurrentLayer为什么不能直接继承RecurrentLayer呢？

luotao1 · 2017-12-19T12:38:36Z

paddle/gserver/layers/MKLPackedGemm.h

+  real* weightPacked_;
+  real* weightTPacked_;
+  size_t weightHeight_;
+  size_t weightWidth_;


变量请加注释，weightTPacked_和weightPacked_有什么区别？前者是trans过的？

好的，我们会添加上必要的注释。

luotao1 · 2017-12-19T12:42:07Z

paddle/gserver/layers/MKLPackedGemm.h

+                          weightWidth_,
+                          1,
+                          batch2->getData(),
+                          weightWidth_);


能否简化：

cblas_sgemm_compute(CblasRowMajor, CblasNoTrans, CblasPacked, batch2->getHeight(), weightWidth_, weightHeight_, batch1->getData(), weightHeight_, transW? weightTPacked_ : weightPacked_ , weightWidth_, 1, batch2->getData(), weightWidth_);

luotao1 · 2017-12-19T12:47:16Z

paddle/gserver/layers/MKLPackedGemm.h

+                     1.0,
+                     weight->getData(),
+                     weightWidth_,
+                     weightTPacked_);


33-47行请简化：

auto pack= [](real* Packed) { Pack = cblas_sgemm_alloc(CblasBMatrix, 1, weightWidth_, weightHeight_); cblas_sgemm_pack(); }; pack(weightTPacked_); pack(weightPacked_);

这里初始化的时候，不用根据trans值只初始化一个么？必须初始化两个？

好的，我们会再优化。
另外优化后的逻辑更清晰点，可以只trans一个。

luotao1 · 2017-12-19T12:55:52Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+
+REGISTER_LAYER(mkl_packed_recurrent, MKLPackedRecurrentLayer);
+
+bool MKLPackedRecurrentLayer::init(const LayerMap& layerMap,


MKLPackedRecurrentLayer可以继承RecurrentLayer么，看这个cpp里的代码，和RecurrentLayer中的很多都是类似的。

可以的，我们会先把paddle原来的recurrent layer 提出一个头文件。

luotao1 · 2017-12-19T12:58:03Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+
+  sgemm_packed_.reset(new MKLPackedGemm(weight_->getW()));
+
+  return true;


bool MKLPackedRecurrentLayer::init(const LayerMap& layerMap, const ParameterMap& parameterMap) { RecurrentLayer::init(layerMap, parameterMap); sgemm_packed_.reset(new MKLPackedGemm(weight_->getW())); }

luotao1 · 2017-12-19T13:03:17Z

paddle/gserver/tests/test_RecurrentLayer.cpp

+  CpuMatrix outputGrad(inputGrad->getHeight(), inputGrad->getWidth());
+  outputGrad.randomizeUniform();
+
+  for (int i = 0; i < 2; i++) {


468-539行重复代码太多了，请简化。

好的，会改掉。thx。

luotao1 · 2017-12-19T13:05:12Z

paddle/gserver/tests/test_RecurrentLayer.cpp

@@ -420,12 +420,167 @@ TEST(Layer, LstmLayer) {
  }
 }

+#ifdef PADDLE_WITH_MKLML
+
+LayerPtr initMKLPackedLayer(LayerConfig layerConfig,


为什么这个不能用initRecurrentLayer？

好的，这里还可以在优化下。

luotao1 · 2017-12-19T13:07:43Z

paddle/gserver/tests/test_RecurrentLayer.cpp

+        LayerPtr dataLayer =
+            creatDataLayer("layer_0", batchSize, layerSize, false);
+        ParameterPtr para =
+            creatParameter("para_0", 0, layerSize * layerSize, false);


dataLayer和para的初始化是不是应该放进checkMKLPackedLayer呢

好的，没问题x。

luotao1

大量缺少注释
虽然MKLPackedRecurrentLayer.cpp是仿照RecurrentLayer.cpp写的，但RecurrentLayer.cpp里面很多取名随意、代码冗余的情况，建议不要在MKLPackedRecurrentLayer.cpp出现。
设计文档中使用 cblas_?gemm API，但代码中只使用了 cblas_sgemm，请说明下原因，或者修改设计文档。

luotao1 · 2017-12-22T11:28:03Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+        MatrixPtr batch1 =
+            batchValue_->getBatchValue(n - 1, batch2->getHeight());
+
+        // batch2->mul(*batch1, *weight_->getW(), 1, 1);


62行不要的代码可以删掉

好的，谢谢

luotao1 · 2017-12-22T11:29:55Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+    REGISTER_TIMER_INFO("RecurrentFwBatch", getName().c_str());
+    /* forward one batch */
+    for (size_t n = 0; n < batchValue_->getNumBatch(); n++) {
+      MatrixPtr batch2 = batchValue_->getBatchValue(n);


batch2, batch1的名字不好
batch2->batch
batch1->pre_batch
backward里面同

好的谢谢

luotao1 · 2017-12-22T11:32:04Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+        for (size_t j = 0; j < batch2->getWidth(); j++) {
+          *(batch2->getData() + i * batch2->getWidth() + j) =
+              *(batch2->getData() + i * batch2->getWidth() + j) > 0
+                  ? *(batch2->getData() + i * batch2->getWidth() + j)


getHeight(), getWidth(), getData()，都可以用一个临时变量来先定义在循环外面，避免循环里面不停地访问指针。

不同batch的getWidth()应该都是一样的，可以定义在大循环外面

好的谢谢

luotao1 · 2017-12-22T11:34:18Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+    for (size_t n = 0; n < batchValue_->getNumBatch(); n++) {
+      MatrixPtr batch2 = batchValue_->getBatchValue(n);
+
+      if (n != 0) {


这里的条件判断和70行的判断条件有关联么

没有关联

luotao1 · 2017-12-22T11:35:58Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+      for (size_t i = 0; i < batch2->getHeight(); i++) {
+        for (size_t j = 0; j < batch2->getWidth(); j++) {
+          *(batch2->getData() + i * batch2->getWidth() + j) =
+              *(batch2->getData() + i * batch2->getWidth() + j) > 0


为什么小于0的时候，直接赋值为0呢
66-71行的作用是什么

已经删除这段代码

luotao1 · 2017-12-25T03:47:06Z

paddle/gserver/layers/MKLPackedWeight.h

+                        dst->getWidth());
+  }
+
+  void compute(size_t M, real *A, size_t lda, real *C, size_t ldc) {


60行这个函数有使用到么，如果没有使用过，请删除。

好的谢谢

luotao1 · 2017-12-25T03:48:28Z

paddle/gserver/layers/MKLPackedWeight.h

+  void pack() { pack_(weight_); }
+
+  void compute(MatrixPtr dst, MatrixPtr src) {
+    cblas_sgemm_compute(CblasRowMajor,


之前的设计文档中，https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/mkl/mkl_packed.md#background
使用的是 cblas_?gemm，为什么这里只使用cblas_sgemm，其他的三个数据类型呢？下同。

我们在设计文档的时候考虑了三种数据类型，但是目前MKLPacked*Layer 只支持float 数据类型 (sgemm)

关于这一点，文档中还是留出了余地，不然以后如果要支持double类型，就又要改文档了。

luotao1 · 2017-12-25T03:49:09Z

paddle/gserver/layers/MKLPackedWeight.h

+class MKLPackedWeight {
+protected:
+  real *weight_;
+  real *packedWeight_;


26行求注释

好的谢谢

luotao1 · 2017-12-25T03:50:29Z

paddle/gserver/layers/RecurrentLayer.cpp

@@ -12,6 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License. */

+#include "RecurrentLayer.h"
 #include <gflags/gflags.h>
 #include "Layer.h"
 #include "SequenceToBatch.h"


请简化头文件，16-18行的头文件在RecurrentLayer.h已经有了，这里不需要再写一遍。

好的谢谢

luotao1 · 2017-12-25T03:51:48Z

paddle/gserver/layers/MKLPackedRecurrentLayer.h

+#include "MKLPackedWeight.h"
+#include "RecurrentLayer.h"
+#include "SequenceToBatch.h"
+#include "paddle/utils/Stat.h"


请简化头文件，17,18,21行在RecurrentLayer.h已经包含了，不需要再写一遍

好的谢谢

zhaify

@luotao1
请review 新的comment 和 code

zhaify · 2018-01-02T06:52:53Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+        MatrixPtr batch1 =
+            batchValue_->getBatchValue(n - 1, batch2->getHeight());
+
+        // batch2->mul(*batch1, *weight_->getW(), 1, 1);


好的，谢谢

zhaify · 2018-01-02T06:53:52Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+        for (size_t j = 0; j < batch2->getWidth(); j++) {
+          *(batch2->getData() + i * batch2->getWidth() + j) =
+              *(batch2->getData() + i * batch2->getWidth() + j) > 0
+                  ? *(batch2->getData() + i * batch2->getWidth() + j)


好的谢谢

zhaify · 2018-01-02T08:19:03Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+      arg.grad = batch2;
+      activation_->backward(arg).check();
+
+      if (n != 0) {


这个判断不能移出去，这个是判断循环是不是从state=0开始的，state=0 只做激活，否则做矩阵乘和激活。

zhaify · 2018-01-02T08:19:37Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+
+      if (n != 0) {
+        batch1 = batchGrad_->getBatchValue(n - 1, batch2->getHeight());
+        // batch1->mul(*batch2, *weightT, 1, 1);


好的谢谢

zhaify · 2018-01-02T08:51:15Z

paddle/gserver/layers/MKLPackedWeight.h

+  void pack() { pack_(weight_); }
+
+  void compute(MatrixPtr dst, MatrixPtr src) {
+    cblas_sgemm_compute(CblasRowMajor,


我们在设计文档的时候考虑了三种数据类型，但是目前MKLPacked*Layer 只支持float 数据类型 (sgemm)

zhaify · 2018-01-03T02:40:05Z

paddle/gserver/layers/MKLPackedRecurrentLayer.h

+namespace paddle {
+
+/**
+ * @brief MKLPackedRecurrentLayer takes 1 input layer. The output size is the


好的谢谢

zhaify · 2018-01-03T02:40:32Z

paddle/gserver/layers/MKLPackedRecurrentLayer.h

+ * them by rnn_use_batch flag.
+ */
+
+class MKLPackedRecurrentLayer : public RecurrentLayer {


override 函数也需要写注释吗？

zhaify · 2018-01-03T02:40:38Z

paddle/gserver/layers/MKLPackedRecurrentLayer.h

+
+protected:
+  std::unique_ptr<MKLPackedWeight> packed_weight_;
+  std::unique_ptr<MKLPackedWeight> packed_weightT_;


好的谢谢

zhaify · 2018-01-03T02:40:52Z

paddle/gserver/layers/MKLPackedWeight.h

+
+  void pack() { pack_(weight_); }
+
+  void compute(MatrixPtr dst, MatrixPtr src) {


好的谢谢

zhaify · 2018-01-03T02:51:16Z

paddle/gserver/layers/MKLPackedRecurrentLayer.cpp

+            *output_.grad->subMatrix(starts[seq], len - 1),
+            1,
+            1);
+      }


精简后代码存在功能性问题，已改正。
精简后代码是否会增加后期维护和debug的难度？

luotao1

Thanks for updating, some minor comments.

luotao1 · 2018-01-03T05:06:49Z

paddle/gserver/layers/MKLPackedRecurrentLayer.h

- * sequence by one sequence. The other way is to reorganize the input
- * into batches, then compute rnn one batch by one batch. Users can select
- * them by rnn_use_batch flag.
+ * @brief MKLPackedRecurrentLayer is same with RecurrentLayer but is optimized


is the same with

好的谢谢

luotao1 · 2018-01-03T05:07:15Z

paddle/gserver/layers/MKLPackedRecurrentLayer.h

@@ -66,7 +48,10 @@ class MKLPackedRecurrentLayer : public RecurrentLayer {
                     const int* starts) override;

 protected:
+  /// packed_weight_ is contains same data with


packed_weight_ contains the same data with

好的谢谢

luotao1 · 2018-01-03T05:08:05Z

paddle/gserver/layers/MKLPackedWeight.h

@@ -22,7 +22,9 @@ namespace paddle {

 class MKLPackedWeight {
 protected:
+  /// The pointor of weight


pointer, 27行同

好的谢谢

luotao1 · 2018-01-03T05:11:14Z

paddle/gserver/layers/MKLPackedWeight.h

@@ -41,7 +43,7 @@ class MKLPackedWeight {

  void pack() { pack_(weight_); }

-  void compute(MatrixPtr dst, MatrixPtr src) {
+  void compute(MatrixPtr dst, const MatrixPtr src) {


compute函数名建议改成gemm_compute或者其他，目前的名称太范了。

一般顺序是：const MatrixPtr src， MatrixPtr dst

好的谢谢

zhaify

@luotao1 code updated

zhaify · 2018-01-03T05:38:48Z

paddle/gserver/layers/MKLPackedRecurrentLayer.h

- * sequence by one sequence. The other way is to reorganize the input
- * into batches, then compute rnn one batch by one batch. Users can select
- * them by rnn_use_batch flag.
+ * @brief MKLPackedRecurrentLayer is same with RecurrentLayer but is optimized


好的谢谢

zhaify · 2018-01-03T05:39:19Z

paddle/gserver/layers/MKLPackedRecurrentLayer.h

@@ -66,7 +48,10 @@ class MKLPackedRecurrentLayer : public RecurrentLayer {
                     const int* starts) override;

 protected:
+  /// packed_weight_ is contains same data with


好的谢谢

zhaify · 2018-01-03T05:39:35Z

paddle/gserver/layers/MKLPackedWeight.h

@@ -22,7 +22,9 @@ namespace paddle {

 class MKLPackedWeight {
 protected:
+  /// The pointor of weight


好的谢谢

zhaify · 2018-01-03T05:40:09Z

paddle/gserver/layers/MKLPackedWeight.h

@@ -41,7 +43,7 @@ class MKLPackedWeight {

  void pack() { pack_(weight_); }

-  void compute(MatrixPtr dst, MatrixPtr src) {
+  void compute(MatrixPtr dst, const MatrixPtr src) {


好的谢谢

luotao1

LGTM

tensor-tang requested a review from luotao1 December 18, 2017 13:51

add MKL Packed RecurrentLayer

624e3e5

luotao1 reviewed Dec 19, 2017

View reviewed changes

tensor-tang added 6 commits December 19, 2017 21:40

enable gtest for MKLPackedRecurrentLayer

2e101df

fix compile error

0f8aad2

disable use_gpu when test mkl recurrent layer comparing with cpu

b95834d

add recurrent layer header

0b080a4

follow comments and refine code

8209103

Merge remote-tracking branch 'upstream/develop' into mkl_packed

290edd8

tensor-tang force-pushed the mkl_packed branch from 7e18122 to 290edd8 Compare December 22, 2017 05:58

tensor-tang added 2 commits December 22, 2017 14:45

refine test recurrent layer

0596cd8

fix compile error

4360615

luotao1 reviewed Dec 25, 2017

View reviewed changes

zhaify reviewed Jan 3, 2018

View reviewed changes

tensor-tang added 2 commits January 3, 2018 11:37

follow comments refine code

df2b054

Merge remote-tracking branch 'upstream/develop' into mkl_packed

adf79fa

luotao1 reviewed Jan 3, 2018

View reviewed changes

zhaify reviewed Jan 3, 2018

View reviewed changes

follow comments, refine comment and function name

89cb3a2

luotao1 approved these changes Jan 3, 2018

View reviewed changes

luotao1 merged commit c100230 into PaddlePaddle:develop Jan 3, 2018

tensor-tang deleted the mkl_packed branch January 3, 2018 09:18


		REGISTER_LAYER(mkl_packed_recurrent, MKLPackedRecurrentLayer);

		bool MKLPackedRecurrentLayer::init(const LayerMap& layerMap,


		sgemm_packed_.reset(new MKLPackedGemm(weight_->getW()));

		return true;


		void pack() { pack_(weight_); }

		void compute(MatrixPtr dst, MatrixPtr src) {

enable MKL Packed Recurrent Layer #6719

enable MKL Packed Recurrent Layer #6719

Conversation

tensor-tang commented Dec 18, 2017

luotao1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tensor-tang Dec 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaify left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 Jan 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaify left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

tensor-tang Dec 21, 2017 •

edited

Loading

luotao1 Jan 3, 2018 •

edited

Loading