-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Bilinear Tensor Product operator. #5014
Add Bilinear Tensor Product operator. #5014
Conversation
auto weight_dims = ctx->GetInputDim("Weight"); | ||
|
||
PADDLE_ENFORCE_EQ(x_dims.size(), 1, "The input X must be a vector."); | ||
PADDLE_ENFORCE_EQ(y_dims.size(), 1, "The input Y must be a vector."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why X and Y are vectors? A mini-batch is a 2-D tensor. The 1st dimension is batch size, and the 2nd dimension is the hidden size of X and Y.
- If a variable is a vector, it should also be a 2-D tensor. It is necessary to indicate it is a row vector [1 x N] or a column vector [N x 1].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operator must implement the batch computation.
a6ba383
to
f5cb52c
Compare
template <typename Place, typename T> | ||
class BilinearTensorProductCUDAKernel : public framework::OpKernel<T> { | ||
public: | ||
void Compute(const framework::ExecutionContext& ctx) const override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里GPU下为什么需要从CPU拷贝输入输出呢?不管是Eigen还是矩阵乘法都有GPU下的实现。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经做了修正,不需要从CPU进行拷贝。
auto y_dims = ctx->GetInputDim("Y"); | ||
auto weight_dims = ctx->GetInputDim("Weight"); | ||
|
||
PADDLE_ENFORCE_EQ(x_dims.size(), 2, "The input X must be a 2D Tensor."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2UL, 39~45,下同。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"The second dimension of X must be equal with the second " | ||
"dimension of the Weight."); | ||
PADDLE_ENFORCE_EQ(y_dims[1], weight_dims[2], | ||
"The second dimension of Y must be equal with the third " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be equal to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
PADDLE_ENFORCE_EQ(x_dims[0], y_dims[0], | ||
"The first dimension(batch_size) of X must be " | ||
"equal with the first dimension of the Y."); | ||
PADDLE_ENFORCE_EQ(x_dims[1], weight_dims[1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be equal to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"The input Weight must be a 3D tensor."); | ||
PADDLE_ENFORCE_GT(weight_dims[0], 0, | ||
"The first dimension of Weight must be larger than 0."); | ||
PADDLE_ENFORCE_GT(weight_dims[1], 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用PADDLE_ENFORCE~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,99 @@ | |||
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
License的缩进有问题。按照accuracy_op.h 。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"The first dimension(batch_size) of Out@GRAD must be equal with " | ||
"the first dimension of the X."); | ||
PADDLE_ENFORCE_EQ(weight_dims[0], out_dims[1], | ||
"The second dimension of Out@GRAD must be equal with " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be equal to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"the first dimension of the X."); | ||
PADDLE_ENFORCE_EQ(weight_dims[0], out_dims[1], | ||
"The second dimension of Out@GRAD must be equal with " | ||
"the third dimension of the Weight."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weight --> Input(Weight)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (ctx->HasInput("Bias")) { | ||
auto bias_dims = ctx->GetInputDim("Bias"); | ||
PADDLE_ENFORCE_EQ(bias_dims[1], out_dims[1], | ||
"The second dimension of Bias must be equal with " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be equal to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
namespace paddle { | ||
namespace operators { | ||
|
||
using Tensor = framework::Tensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using framework::LoDTensor;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为这个OP没有用到LoDTensor,所以改成了using framework::Tensor;
ctx.GetPlace()); | ||
auto left_mul_mat = EigenMatrix<T>::From(left_mul); | ||
Tensor output_col; | ||
output_col.mutable_data<T>(framework::make_ddim({weight_dims[0]}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一定需要这个临时变量吗?为什么不可以从输出Tensor里面切出一片,构造为一个 EigenMatrix,将69行的计算结果直接赋值给从输出Tensor切出来的一片。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
auto y_dims = ctx->GetInputDim("Y"); | ||
auto weight_dims = ctx->GetInputDim("Weight"); | ||
|
||
PADDLE_ENFORCE_EQ(x_dims.size(), 2UL, "The input X must be a 2D Tensor."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is much better if the naming of inputs and outputs all the comments follow the same style. In line 28 and 32, input and output are denoted as Input(X) and Output(out), I think this is clear. So could you please keep a consistent style in all of the comments below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
PADDLE_ENFORCE(weight_dims[1], | ||
"The second dimension of Weight must be larger than 0."); | ||
PADDLE_ENFORCE(weight_dims[2], | ||
"The third dimension of Weight must be larger than 0."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove 41 ~ 45. The three dimensions of learnable parameter Weight is determined by the dimension of X, the dimension of Y and the user-customized size of this operator. The dimension of X and Y are all customized by the user. The dimension is larger than 0 can be guaranteed when defining the network topology. This check is not necessary during the execution of this op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
PADDLE_ENFORCE_EQ(bias_dims.size(), 2UL, | ||
"The input Bias must have 2 dimensions."); | ||
PADDLE_ENFORCE_EQ(bias_dims[0], 1UL, | ||
"The first dimention of input Bias must be 1."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge line 59 ~ 61
PADDLE_ENFORCE(bias_dims.size() == 2UL && bias_dims[1] == 1UL,
"The Input(bias) should be a 2-D tensor with the 2nd "
"dimensions fixed to 1 (a row vector).")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
BilinearTensorProductOpMaker(framework::OpProto* proto, | ||
framework::OpAttrChecker* op_checker) | ||
: OpProtoAndCheckerMaker(proto, op_checker) { | ||
AddInput("X", "The first input of BilinearTensorProduct op."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to user bilinear_tensor_product operator. Because "BilinearTensorProductOp" is a name for the developer (in C++ codes), while "bilinear_tensor_product operator" is the name for the user (exposed by the user interface).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
: OpProtoAndCheckerMaker(proto, op_checker) { | ||
AddInput("X", "The first input of BilinearTensorProduct op."); | ||
AddInput("Y", "The second input of BilinearTensorProduct op."); | ||
AddInput("Weight", "The input weight of BilinearTensorProduct op."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The learnable parameters of ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
AddInput("X", "The first input of BilinearTensorProduct op."); | ||
AddInput("Y", "The second input of BilinearTensorProduct op."); | ||
AddInput("Weight", "The input weight of BilinearTensorProduct op."); | ||
AddInput("Bias", "The input bias of BilinearTensorProduct op.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The learnable bias for bilinear_tensor_product operator. Do not use an abbreviation in the comments, if it is necessary (widely accept, or the name is too long).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ops::BilinearTensorProductOpGrad); | ||
REGISTER_OP_CPU_KERNEL( | ||
bilinear_tensor_product, | ||
ops::BilinearTensorProductKernel<paddle::platform::CPUPlace, float>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
register a kernel support the double type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (d_x) { | ||
d_x->mutable_data<T>(ctx.GetPlace()); | ||
set_zero(ctx.device_context(), d_x, static_cast<T>(0)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (d_x) d_x->mutable_data<T>(ctx.GetPlace());
Setting zero is not necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an additive operation for d_x:
d_x = d_x + y_scale weight_i
For this reason, the elements of d_x must be initialized as 0. Otherwise this op will lead to erroneous result.
if (d_y) { | ||
d_y->mutable_data<T>(ctx.GetPlace()); | ||
set_zero(ctx.device_context(), d_y, static_cast<T>(0)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (d_y) d_y->mutable_data<T>(ctx.GetPlace());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same with d_x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same to ...
I see.
|
||
def test_check_grad_normal(self): | ||
self.check_grad(['X', 'Y', 'Weight', 'Bias'], 'Out') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need TestBilinearTensorProductOp2 and TestBilinearTensorProductOp3? I do not think these tests are necessary. They are not necessary boundary case for BilinearTensorProductOp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
template <typename T, int MajorType = Eigen::RowMajor, | ||
typename IndexType = Eigen::DenseIndex> | ||
using EigenVector = framework::EigenVector<T, MajorType, IndexType>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
30 ~ 32 行删掉。并没有用到 EigenVector。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
auto weight_dims = weight->dims(); | ||
auto place = ctx.GetEigenDevice<Place>(); | ||
|
||
// Create the intermediate variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please complete the comments. Otherwise, I will wonder create the intermediate variables for what?
- You can just add the formula to the comment.
- It is
variable
notvariables
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
math::SetConstant<Place, T> set_zero; | ||
|
||
// Set X@Grad be zero at first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "at first".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
auto d_out_mat = EigenMatrix<T>::From(*d_out); | ||
auto place = ctx.GetEigenDevice<Place>(); | ||
|
||
// Create the intermediate variables for gradient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please complete the comments. There are three gradients need to be computed in backward.
Create the intermediate variables for whose gradients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
} | ||
|
||
// Caculate the gradient of Weight. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weight --> Input(Weight) to keep a consistent naming style in comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
} | ||
|
||
// Caculate the gradient of Bias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bias --> Input(Bias)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
set_zero(ctx.device_context(), d_y, static_cast<T>(0)); | ||
} | ||
|
||
// Caculate the X@Grad and Y@Grad. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output(X@Grad) and Output(Y@Grad)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (d_y) { | ||
d_y->mutable_data<T>(ctx.GetPlace()); | ||
set_zero(ctx.device_context(), d_y, static_cast<T>(0)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same to ...
I see.
Tensor weight_i = weight->Slice(i, i + 1).Resize( | ||
framework::make_ddim({weight_dims[1], weight_dims[2]})); | ||
auto output_vec = d_out_mat.chip(i, 1); | ||
if (d_x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
以 dx 为例 ,dy相同,
为什么不可以在 135 ~ 138 之后再进行这个 “scaling” 运算呢?这样是不是就可以直接去掉 x_scale
和 x_scale
这样两个中间变量(也避免分配内存的问题)。
不知是否可行。因为这个 "scaling" 操作从计算的逻辑上是可以 “原地” 运算。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里由于broadcast是在batch的方向展开,且TMP = scaled(X) W,scaled(X)中每一行元素所乘的放缩系数不同,所以无法在矩阵乘法之后做scaling计算。即scaled(X) W != scaled(X W).
There is one warning in the teamcity log, please fix it.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -43,24 +43,26 @@ class BilinearTensorProductKernel : public framework::OpKernel<T> { | |||
|
|||
auto batch_size = x->dims()[0]; | |||
auto weight_dims = weight->dims(); | |||
int Out_dim = weight_dims[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out_dim --> out_dim
X_dim --> x_dim
Y_dim -->y_dim
第一个字母不要大写。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
db7055b
to
56212d0
Compare
56212d0
to
c5d7107
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
resolve #4789