Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable MKL Packed Recurrent Layer #6719

Merged
merged 12 commits into from
Jan 3, 2018
Merged

Conversation

tensor-tang
Copy link
Contributor

fix #6512

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

从代码实现看,MKLPackedRecurrentLayer为什么不能直接继承RecurrentLayer呢?

real* weightPacked_;
real* weightTPacked_;
size_t weightHeight_;
size_t weightWidth_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

变量请加注释,weightTPacked_和weightPacked_有什么区别?前者是trans过的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,我们会添加上必要的注释。

weightWidth_,
1,
batch2->getData(),
weightWidth_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能否简化:

cblas_sgemm_compute(CblasRowMajor,
                         CblasNoTrans,
                         CblasPacked,
                         batch2->getHeight(),
                         weightWidth_,
                         weightHeight_,
                         batch1->getData(),
                         weightHeight_,
                         transW? weightTPacked_ : weightPacked_ ,
                         weightWidth_,
                         1,
                         batch2->getData(),
                         weightWidth_);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thx.

1.0,
weight->getData(),
weightWidth_,
weightTPacked_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

33-47行请简化:

auto pack= [](real* Packed) {
    Pack = cblas_sgemm_alloc(CblasBMatrix, 1, weightWidth_, weightHeight_);
     cblas_sgemm_pack();
};
pack(weightTPacked_);
pack(weightPacked_);

这里初始化的时候,不用根据trans值只初始化一个么?必须初始化两个?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,我们会再优化。
另外优化后的逻辑更清晰点,可以只trans一个。


REGISTER_LAYER(mkl_packed_recurrent, MKLPackedRecurrentLayer);

bool MKLPackedRecurrentLayer::init(const LayerMap& layerMap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MKLPackedRecurrentLayer可以继承RecurrentLayer么,看这个cpp里的代码,和RecurrentLayer中的很多都是类似的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以的,我们会先把paddle原来的recurrent layer 提出一个头文件。


sgemm_packed_.reset(new MKLPackedGemm(weight_->getW()));

return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool MKLPackedRecurrentLayer::init(const LayerMap& layerMap,  const ParameterMap& parameterMap) {
    RecurrentLayer::init(layerMap, parameterMap);
    sgemm_packed_.reset(new MKLPackedGemm(weight_->getW()));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

CpuMatrix outputGrad(inputGrad->getHeight(), inputGrad->getWidth());
outputGrad.randomizeUniform();

for (int i = 0; i < 2; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

468-539行重复代码太多了,请简化。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,会改掉。thx。

@@ -420,12 +420,167 @@ TEST(Layer, LstmLayer) {
}
}

#ifdef PADDLE_WITH_MKLML

LayerPtr initMKLPackedLayer(LayerConfig layerConfig,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么这个不能用initRecurrentLayer?

Copy link
Contributor Author

@tensor-tang tensor-tang Dec 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,这里还可以在优化下。

LayerPtr dataLayer =
creatDataLayer("layer_0", batchSize, layerSize, false);
ParameterPtr para =
creatParameter("para_0", 0, layerSize * layerSize, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataLayer和para的初始化是不是应该放进checkMKLPackedLayer呢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,没问题x。

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 大量缺少注释
  2. 虽然MKLPackedRecurrentLayer.cpp是仿照RecurrentLayer.cpp写的,但RecurrentLayer.cpp里面很多取名随意、代码冗余的情况,建议不要在MKLPackedRecurrentLayer.cpp出现。
  3. 设计文档中使用 cblas_?gemm API, 但代码中只使用了 cblas_sgemm,请说明下原因,或者修改设计文档。

MatrixPtr batch1 =
batchValue_->getBatchValue(n - 1, batch2->getHeight());

// batch2->mul(*batch1, *weight_->getW(), 1, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

62行不要的代码可以删掉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,谢谢

REGISTER_TIMER_INFO("RecurrentFwBatch", getName().c_str());
/* forward one batch */
for (size_t n = 0; n < batchValue_->getNumBatch(); n++) {
MatrixPtr batch2 = batchValue_->getBatchValue(n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch2, batch1的名字不好
batch2->batch
batch1->pre_batch
backward里面同

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

for (size_t j = 0; j < batch2->getWidth(); j++) {
*(batch2->getData() + i * batch2->getWidth() + j) =
*(batch2->getData() + i * batch2->getWidth() + j) > 0
? *(batch2->getData() + i * batch2->getWidth() + j)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • getHeight(), getWidth(), getData(),都可以用一个临时变量来先定义在循环外面,避免循环里面不停地访问指针。
  • 不同batch的getWidth()应该都是一样的,可以定义在大循环外面

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

for (size_t n = 0; n < batchValue_->getNumBatch(); n++) {
MatrixPtr batch2 = batchValue_->getBatchValue(n);

if (n != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的条件判断和70行的判断条件有关联么

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有关联

for (size_t i = 0; i < batch2->getHeight(); i++) {
for (size_t j = 0; j < batch2->getWidth(); j++) {
*(batch2->getData() + i * batch2->getWidth() + j) =
*(batch2->getData() + i * batch2->getWidth() + j) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么小于0的时候,直接赋值为0呢
66-71行的作用是什么

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经删除这段代码

dst->getWidth());
}

void compute(size_t M, real *A, size_t lda, real *C, size_t ldc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60行这个函数有使用到么,如果没有使用过,请删除。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

void pack() { pack_(weight_); }

void compute(MatrixPtr dst, MatrixPtr src) {
cblas_sgemm_compute(CblasRowMajor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前的设计文档中,https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/mkl/mkl_packed.md#background
使用的是 cblas_?gemm, 为什么这里只使用cblas_sgemm,其他的三个数据类型呢?下同。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们在设计文档的时候考虑了三种数据类型,但是目前MKLPacked*Layer 只支持float 数据类型 (sgemm)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

关于这一点,文档中还是留出了余地,不然以后如果要支持double类型,就又要改文档了。

class MKLPackedWeight {
protected:
real *weight_;
real *packedWeight_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

26行求注释

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

@@ -12,6 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#include "RecurrentLayer.h"
#include <gflags/gflags.h>
#include "Layer.h"
#include "SequenceToBatch.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请简化头文件,16-18行的头文件在RecurrentLayer.h已经有了,这里不需要再写一遍。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

#include "MKLPackedWeight.h"
#include "RecurrentLayer.h"
#include "SequenceToBatch.h"
#include "paddle/utils/Stat.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请简化头文件,17,18,21行在RecurrentLayer.h已经包含了,不需要再写一遍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

Copy link
Contributor

@zhaify zhaify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1
请review 新的comment 和 code

MatrixPtr batch1 =
batchValue_->getBatchValue(n - 1, batch2->getHeight());

// batch2->mul(*batch1, *weight_->getW(), 1, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,谢谢

for (size_t j = 0; j < batch2->getWidth(); j++) {
*(batch2->getData() + i * batch2->getWidth() + j) =
*(batch2->getData() + i * batch2->getWidth() + j) > 0
? *(batch2->getData() + i * batch2->getWidth() + j)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

arg.grad = batch2;
activation_->backward(arg).check();

if (n != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个判断不能移出去,这个是判断循环是不是从state=0开始的,state=0 只做激活,否则做矩阵乘和激活。


if (n != 0) {
batch1 = batchGrad_->getBatchValue(n - 1, batch2->getHeight());
// batch1->mul(*batch2, *weightT, 1, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

void pack() { pack_(weight_); }

void compute(MatrixPtr dst, MatrixPtr src) {
cblas_sgemm_compute(CblasRowMajor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们在设计文档的时候考虑了三种数据类型,但是目前MKLPacked*Layer 只支持float 数据类型 (sgemm)

namespace paddle {

/**
* @brief MKLPackedRecurrentLayer takes 1 input layer. The output size is the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

* them by rnn_use_batch flag.
*/

class MKLPackedRecurrentLayer : public RecurrentLayer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

override 函数也需要写注释吗?


protected:
std::unique_ptr<MKLPackedWeight> packed_weight_;
std::unique_ptr<MKLPackedWeight> packed_weightT_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢


void pack() { pack_(weight_); }

void compute(MatrixPtr dst, MatrixPtr src) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

*output_.grad->subMatrix(starts[seq], len - 1),
1,
1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

精简后代码存在功能性问题,已改正。
精简后代码是否会增加后期维护和debug的难度?

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating, some minor comments.

* sequence by one sequence. The other way is to reorganize the input
* into batches, then compute rnn one batch by one batch. Users can select
* them by rnn_use_batch flag.
* @brief MKLPackedRecurrentLayer is same with RecurrentLayer but is optimized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the same with

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的 谢谢

@@ -66,7 +48,10 @@ class MKLPackedRecurrentLayer : public RecurrentLayer {
const int* starts) override;

protected:
/// packed_weight_ is contains same data with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packed_weight_ contains the same data with

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

@@ -22,7 +22,9 @@ namespace paddle {

class MKLPackedWeight {
protected:
/// The pointor of weight
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointer, 27行同

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

@@ -41,7 +43,7 @@ class MKLPackedWeight {

void pack() { pack_(weight_); }

void compute(MatrixPtr dst, MatrixPtr src) {
void compute(MatrixPtr dst, const MatrixPtr src) {
Copy link
Contributor

@luotao1 luotao1 Jan 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • compute函数名建议改成gemm_compute或者其他,目前的名称太范了。
  • 一般顺序是:const MatrixPtr src, MatrixPtr dst

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

Copy link
Contributor

@zhaify zhaify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luotao1 code updated

* sequence by one sequence. The other way is to reorganize the input
* into batches, then compute rnn one batch by one batch. Users can select
* them by rnn_use_batch flag.
* @brief MKLPackedRecurrentLayer is same with RecurrentLayer but is optimized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的 谢谢

@@ -66,7 +48,10 @@ class MKLPackedRecurrentLayer : public RecurrentLayer {
const int* starts) override;

protected:
/// packed_weight_ is contains same data with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

@@ -22,7 +22,9 @@ namespace paddle {

class MKLPackedWeight {
protected:
/// The pointor of weight
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

@@ -41,7 +43,7 @@ class MKLPackedWeight {

void pack() { pack_(weight_); }

void compute(MatrixPtr dst, MatrixPtr src) {
void compute(MatrixPtr dst, const MatrixPtr src) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的谢谢

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit c100230 into PaddlePaddle:develop Jan 3, 2018
@tensor-tang tensor-tang deleted the mkl_packed branch January 3, 2018 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance recurrent layer performance
3 participants