Add Bilinear Tensor Product operator. #5014

peterzhang2029 · 2017-10-23T09:39:19Z

resolve #4789

lcy-seso · 2017-10-27T12:59:16Z

paddle/operators/bilinear_tensor_product_op.cc

+    auto weight_dims = ctx->GetInputDim("Weight");
+
+    PADDLE_ENFORCE_EQ(x_dims.size(), 1, "The input X must be a vector.");
+    PADDLE_ENFORCE_EQ(y_dims.size(), 1, "The input Y must be a vector.");


Why X and Y are vectors? A mini-batch is a 2-D tensor. The 1st dimension is batch size, and the 2nd dimension is the hidden size of X and Y.

If a variable is a vector, it should also be a 2-D tensor. It is necessary to indicate it is a row vector [1 x N] or a column vector [N x 1].

lcy-seso

The operator must implement the batch computation.

… bi_tensor_prod_op

lcy-seso · 2017-11-08T00:51:57Z

paddle/operators/bilinear_tensor_product_op.cu

+template <typename Place, typename T>
+class BilinearTensorProductCUDAKernel : public framework::OpKernel<T> {
+ public:
+  void Compute(const framework::ExecutionContext& ctx) const override {


这里GPU下为什么需要从CPU拷贝输入输出呢？不管是Eigen还是矩阵乘法都有GPU下的实现。

已经做了修正，不需要从CPU进行拷贝。

lcy-seso · 2017-11-08T00:52:49Z

paddle/operators/bilinear_tensor_product_op.cc

+    auto y_dims = ctx->GetInputDim("Y");
+    auto weight_dims = ctx->GetInputDim("Weight");
+
+    PADDLE_ENFORCE_EQ(x_dims.size(), 2, "The input X must be a 2D Tensor.");


2UL, 39~45，下同。

lcy-seso · 2017-11-08T00:53:36Z

paddle/operators/bilinear_tensor_product_op.cc

+                      "The second dimension of X must be equal with the second "
+                      "dimension of the Weight.");
+    PADDLE_ENFORCE_EQ(y_dims[1], weight_dims[2],
+                      "The second dimension of Y must be equal with the third "


be equal to

lcy-seso · 2017-11-08T00:53:49Z

paddle/operators/bilinear_tensor_product_op.cc

+    PADDLE_ENFORCE_EQ(x_dims[0], y_dims[0],
+                      "The first dimension(batch_size) of X must be "
+                      "equal with the first dimension of the Y.");
+    PADDLE_ENFORCE_EQ(x_dims[1], weight_dims[1],


be equal to

lcy-seso · 2017-11-08T00:54:10Z

paddle/operators/bilinear_tensor_product_op.cc

+                      "The input Weight must be a 3D tensor.");
+    PADDLE_ENFORCE_GT(weight_dims[0], 0,
+                      "The first dimension of Weight must be larger than 0.");
+    PADDLE_ENFORCE_GT(weight_dims[1], 0,


用PADDLE_ENFORCE~

lcy-seso · 2017-11-08T00:56:59Z

paddle/operators/bilinear_tensor_product_op.cu

@@ -0,0 +1,99 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.


License的缩进有问题。按照accuracy_op.h 。

lcy-seso · 2017-11-08T01:21:11Z

paddle/operators/bilinear_tensor_product_op.cc

+        "The first dimension(batch_size) of Out@GRAD must be equal with "
+        "the first dimension of the X.");
+    PADDLE_ENFORCE_EQ(weight_dims[0], out_dims[1],
+                      "The second dimension of Out@GRAD must be equal with "


be equal to

lcy-seso · 2017-11-08T01:21:31Z

paddle/operators/bilinear_tensor_product_op.cc

+        "the first dimension of the X.");
+    PADDLE_ENFORCE_EQ(weight_dims[0], out_dims[1],
+                      "The second dimension of Out@GRAD must be equal with "
+                      "the third dimension of the Weight.");


Weight --> Input(Weight)

lcy-seso · 2017-11-08T01:21:43Z

paddle/operators/bilinear_tensor_product_op.cc

+    if (ctx->HasInput("Bias")) {
+      auto bias_dims = ctx->GetInputDim("Bias");
+      PADDLE_ENFORCE_EQ(bias_dims[1], out_dims[1],
+                        "The second dimension of Bias must be equal with "


be equal to

lcy-seso · 2017-11-08T01:23:02Z

paddle/operators/bilinear_tensor_product_op.h

+namespace paddle {
+namespace operators {
+
+using Tensor = framework::Tensor;


using framework::LoDTensor;

因为这个OP没有用到LoDTensor，所以改成了using framework::Tensor;

lcy-seso · 2017-11-08T01:30:40Z

paddle/operators/bilinear_tensor_product_op.h

+                             ctx.GetPlace());
+    auto left_mul_mat = EigenMatrix<T>::From(left_mul);
+    Tensor output_col;
+    output_col.mutable_data<T>(framework::make_ddim({weight_dims[0]}),


一定需要这个临时变量吗？为什么不可以从输出Tensor里面切出一片，构造为一个 EigenMatrix，将69行的计算结果直接赋值给从输出Tensor切出来的一片。

… bi_tensor_prod_op

lcy-seso · 2017-11-09T02:45:45Z

paddle/operators/bilinear_tensor_product_op.cc

+    auto y_dims = ctx->GetInputDim("Y");
+    auto weight_dims = ctx->GetInputDim("Weight");
+
+    PADDLE_ENFORCE_EQ(x_dims.size(), 2UL, "The input X must be a 2D Tensor.");


It is much better if the naming of inputs and outputs all the comments follow the same style. In line 28 and 32, input and output are denoted as Input(X) and Output(out), I think this is clear. So could you please keep a consistent style in all of the comments below.

lcy-seso · 2017-11-09T02:59:15Z

paddle/operators/bilinear_tensor_product_op.cc

+    PADDLE_ENFORCE(weight_dims[1],
+                   "The second dimension of Weight must be larger than 0.");
+    PADDLE_ENFORCE(weight_dims[2],
+                   "The third dimension of Weight must be larger than 0.");


remove 41 ~ 45. The three dimensions of learnable parameter Weight is determined by the dimension of X, the dimension of Y and the user-customized size of this operator. The dimension of X and Y are all customized by the user. The dimension is larger than 0 can be guaranteed when defining the network topology. This check is not necessary during the execution of this op.

lcy-seso · 2017-11-09T03:09:02Z

paddle/operators/bilinear_tensor_product_op.cc

+      PADDLE_ENFORCE_EQ(bias_dims.size(), 2UL,
+                        "The input Bias must have 2 dimensions.");
+      PADDLE_ENFORCE_EQ(bias_dims[0], 1UL,
+                        "The first dimention of input Bias must be 1.");


merge line 59 ~ 61

PADDLE_ENFORCE(bias_dims.size() == 2UL && bias_dims[1] == 1UL, "The Input(bias) should be a 2-D tensor with the 2nd " "dimensions fixed to 1 (a row vector).")

lcy-seso · 2017-11-09T03:13:14Z

paddle/operators/bilinear_tensor_product_op.cc

+  BilinearTensorProductOpMaker(framework::OpProto* proto,
+                               framework::OpAttrChecker* op_checker)
+      : OpProtoAndCheckerMaker(proto, op_checker) {
+    AddInput("X", "The first input of BilinearTensorProduct op.");


I prefer to user bilinear_tensor_product operator. Because "BilinearTensorProductOp" is a name for the developer (in C++ codes), while "bilinear_tensor_product operator" is the name for the user (exposed by the user interface).

lcy-seso · 2017-11-09T03:14:04Z

paddle/operators/bilinear_tensor_product_op.cc

+      : OpProtoAndCheckerMaker(proto, op_checker) {
+    AddInput("X", "The first input of BilinearTensorProduct op.");
+    AddInput("Y", "The second input of BilinearTensorProduct op.");
+    AddInput("Weight", "The input weight of BilinearTensorProduct op.");


The learnable parameters of ...

lcy-seso · 2017-11-09T03:16:15Z

paddle/operators/bilinear_tensor_product_op.cc

+    AddInput("X", "The first input of BilinearTensorProduct op.");
+    AddInput("Y", "The second input of BilinearTensorProduct op.");
+    AddInput("Weight", "The input weight of BilinearTensorProduct op.");
+    AddInput("Bias", "The input bias of BilinearTensorProduct op.")


The learnable bias for bilinear_tensor_product operator. Do not use an abbreviation in the comments, if it is necessary (widely accept, or the name is too long).

lcy-seso · 2017-11-09T03:24:03Z

paddle/operators/bilinear_tensor_product_op.cc

+            ops::BilinearTensorProductOpGrad);
+REGISTER_OP_CPU_KERNEL(
+    bilinear_tensor_product,
+    ops::BilinearTensorProductKernel<paddle::platform::CPUPlace, float>);


register a kernel support the double type.

lcy-seso · 2017-11-09T03:30:55Z

paddle/operators/bilinear_tensor_product_op.h

+    if (d_x) {
+      d_x->mutable_data<T>(ctx.GetPlace());
+      set_zero(ctx.device_context(), d_x, static_cast<T>(0));
+    }


if (d_x) d_x->mutable_data<T>(ctx.GetPlace());

Setting zero is not necessary here.

There is an additive operation for d_x:

d_x = d_x + y_scale weight_i

For this reason, the elements of d_x must be initialized as 0. Otherwise this op will lead to erroneous result.

lcy-seso · 2017-11-09T03:31:28Z

paddle/operators/bilinear_tensor_product_op.h

+    if (d_y) {
+      d_y->mutable_data<T>(ctx.GetPlace());
+      set_zero(ctx.device_context(), d_y, static_cast<T>(0));
+    }


if (d_y) d_y->mutable_data<T>(ctx.GetPlace());

The same with d_x

the same to ...
I see.

lcy-seso · 2017-11-09T03:35:16Z

python/paddle/v2/framework/tests/test_bilinear_tensor_product_op.py

+
+    def test_check_grad_normal(self):
+        self.check_grad(['X', 'Y', 'Weight', 'Bias'], 'Out')
+


Why do we need TestBilinearTensorProductOp2 and TestBilinearTensorProductOp3? I do not think these tests are necessary. They are not necessary boundary case for BilinearTensorProductOp.

lcy-seso · 2017-11-13T02:29:07Z

paddle/operators/bilinear_tensor_product_op.h

+template <typename T, int MajorType = Eigen::RowMajor,
+          typename IndexType = Eigen::DenseIndex>
+using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
+


30 ~ 32 行删掉。并没有用到 EigenVector。

lcy-seso · 2017-11-13T02:30:10Z

paddle/operators/bilinear_tensor_product_op.h

+    auto weight_dims = weight->dims();
+    auto place = ctx.GetEigenDevice<Place>();
+
+    // Create the intermediate variables.


Please complete the comments. Otherwise, I will wonder create the intermediate variables for what?

You can just add the formula to the comment.

It is variable not variables.

lcy-seso · 2017-11-13T02:31:47Z

paddle/operators/bilinear_tensor_product_op.h

+
+    math::SetConstant<Place, T> set_zero;
+
+    // Set X@Grad be zero at first.


remove "at first".

lcy-seso · 2017-11-13T02:32:51Z

paddle/operators/bilinear_tensor_product_op.h

+    auto d_out_mat = EigenMatrix<T>::From(*d_out);
+    auto place = ctx.GetEigenDevice<Place>();
+
+    // Create the intermediate variables for gradient.


Please complete the comments. There are three gradients need to be computed in backward.
Create the intermediate variables for whose gradients?

lcy-seso · 2017-11-13T02:50:26Z

paddle/operators/bilinear_tensor_product_op.h

+      }
+    }
+
+    // Caculate the gradient of Weight.


Weight --> Input(Weight) to keep a consistent naming style in comments.

lcy-seso · 2017-11-13T02:50:46Z

paddle/operators/bilinear_tensor_product_op.h

+      }
+    }
+
+    // Caculate the gradient of Bias.


Bias --> Input(Bias)

lcy-seso · 2017-11-13T02:51:48Z

paddle/operators/bilinear_tensor_product_op.h

+      set_zero(ctx.device_context(), d_y, static_cast<T>(0));
+    }
+
+    // Caculate the X@Grad and Y@Grad.


Output(X@Grad) and Output(Y@Grad)

lcy-seso · 2017-11-13T02:57:45Z

paddle/operators/bilinear_tensor_product_op.h

+    if (d_y) {
+      d_y->mutable_data<T>(ctx.GetPlace());
+      set_zero(ctx.device_context(), d_y, static_cast<T>(0));
+    }


the same to ...
I see.

lcy-seso · 2017-11-13T03:08:32Z

paddle/operators/bilinear_tensor_product_op.h

+        Tensor weight_i = weight->Slice(i, i + 1).Resize(
+            framework::make_ddim({weight_dims[1], weight_dims[2]}));
+        auto output_vec = d_out_mat.chip(i, 1);
+        if (d_x) {


~~以 dx 为例，dy相同，~~
~~$dx = \frac{\partial{\mathcal{L}}}{\partial{Z}}WY^T$ 其中乘以 \partial{\mathcal{L}}}{\partial{Z} 是一个broadcast 的 “scaling” 运算。~~

为什么不可以在 135 ~ 138 之后再进行这个 “scaling” 运算呢？这样是不是就可以直接去掉 x_scale 和 x_scale 这样两个中间变量（也避免分配内存的问题）。

~~不知是否可行。因为这个 "scaling" 操作从计算的逻辑上是可以 “原地” 运算。~~

这里由于broadcast是在batch的方向展开，且TMP = scaled(X) W，scaled(X)中每一行元素所乘的放缩系数不同，所以无法在矩阵乘法之后做scaling计算。即scaled(X) W != scaled(X W).

… bi_tensor_prod_op

luotao1 · 2017-11-13T08:40:43Z

There is one warning in the teamcity log, please fix it.

[07:08:48]W:	 [Step 1/1] In file included from /paddle/paddle/operators/bilinear_tensor_product_op.cc:15:0:
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.h: In instantiation of 'void paddle::operators::BilinearTensorProductKernel<Place, T>::Compute(const paddle::framework::ExecutionContext&) const [with Place = paddle::platform::CPUPlace; T = double]':
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.cc:159:78:   required from here
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.h:56:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
[07:08:48]W:	 [Step 1/1]      for (size_t i = 0; i < weight_dims[0]; ++i) {
[07:08:48]W:	 [Step 1/1]                           ^
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.h: In instantiation of 'void paddle::operators::BilinearTensorProductKernel<Place, T>::Compute(const paddle::framework::ExecutionContext&) const [with Place = paddle::platform::CPUPlace; T = float]':
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.cc:159:78:   required from here
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.h:56:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

peterzhang2029 · 2017-11-13T11:35:09Z

Warning fixed @luotao1 @lcy-seso

lcy-seso

LGTM

lcy-seso · 2017-11-13T11:57:37Z

paddle/operators/bilinear_tensor_product_op.h

@@ -43,24 +43,26 @@ class BilinearTensorProductKernel : public framework::OpKernel<T> {

    auto batch_size = x->dims()[0];
    auto weight_dims = weight->dims();
+    int Out_dim = weight_dims[0];


Out_dim --> out_dim
X_dim --> x_dim
Y_dim -->y_dim

第一个字母不要大写。

lcy-seso

LGTM

add bilinear tensor product op

611ee68

peterzhang2029 requested a review from lcy-seso October 23, 2017 10:12

qingqing01 added the OpPorting label Oct 23, 2017

lcy-seso reviewed Oct 27, 2017

View reviewed changes

lcy-seso requested changes Oct 27, 2017

View reviewed changes

peterzhang2029 added 2 commits November 7, 2017 20:26

update for mini-batch

3ae1424

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f5cb52c

… bi_tensor_prod_op

peterzhang2029 force-pushed the bi_tensor_prod_op branch from a6ba383 to f5cb52c Compare November 7, 2017 13:11

lcy-seso requested changes Nov 8, 2017

View reviewed changes

peterzhang2029 added 2 commits November 8, 2017 14:53

refine memory transform

4726927

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

44e1ac3

… bi_tensor_prod_op

lcy-seso requested changes Nov 9, 2017

View reviewed changes

refine docString

5cf8204

lcy-seso requested changes Nov 13, 2017

View reviewed changes

peterzhang2029 added 3 commits November 13, 2017 13:44

refine notation in bilinear_tensor_product_op.h

5f99ae9

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ab41648

… bi_tensor_prod_op

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

665eb01

… bi_tensor_prod_op

fix warning

0a6262d

lcy-seso previously approved these changes Nov 13, 2017

View reviewed changes

lcy-seso reviewed Nov 13, 2017

View reviewed changes

peterzhang2029 dismissed lcy-seso’s stale review via db7055b November 13, 2017 12:18

peterzhang2029 force-pushed the bi_tensor_prod_op branch from db7055b to 56212d0 Compare November 13, 2017 13:06

refine var name

c5d7107

peterzhang2029 force-pushed the bi_tensor_prod_op branch from 56212d0 to c5d7107 Compare November 13, 2017 13:15

lcy-seso approved these changes Nov 14, 2017

View reviewed changes

lcy-seso merged commit e9695f4 into PaddlePaddle:develop Nov 14, 2017

peterzhang2029 deleted the bi_tensor_prod_op branch November 14, 2017 01:51

		@@ -0,0 +1,99 @@
		/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.


		def test_check_grad_normal(self):
		self.check_grad(['X', 'Y', 'Weight', 'Bias'], 'Out')


		math::SetConstant<Place, T> set_zero;

		// Set X@Grad be zero at first.

Add Bilinear Tensor Product operator. #5014

Add Bilinear Tensor Product operator. #5014

Conversation

peterzhang2029 commented Oct 23, 2017

lcy-seso Oct 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso Nov 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso Nov 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 commented Nov 13, 2017

peterzhang2029 commented Nov 13, 2017

lcy-seso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

lcy-seso Oct 27, 2017 •

edited

Loading

lcy-seso Nov 8, 2017 •

edited

Loading

lcy-seso Nov 13, 2017 •

edited

Loading