Add LRN efficient GPU implement. #5894

gongweibao · 2017-11-24T08:31:46Z

qingqing01 · 2017-12-05T14:23:36Z

paddle/operators/lrn_op.cc

+template <typename T>
+struct LRNFunctor<platform::CPUPlace, T> {
+  void operator()(const framework::ExecutionContext& ctx,
+                  const framework::Tensor* input, framework::Tensor* out,


For input arguments: const framework::Tensor&

https://google.github.io/styleguide/cppguide.html#Reference_Arguments

Done!
Thanks!

qingqing01 · 2017-12-05T14:25:36Z

paddle/operators/lrn_op.cc

+    const int end = start + n;
+
+    auto e_mid = framework::EigenTensor<T, 4>::From(*mid);
+    e_mid.device(ctx.GetEigenDevice<platform::CPUPlace>()) = e_mid.constant(k);


For the CPU implementation of Eigen, there is no need to use .device().

e_mid.setConstant(k);

Done!
Thanks!

qingqing01 · 2017-12-05T14:26:01Z

paddle/operators/lrn_op.cc

+                               Eigen::array<int, 4>({{1, 1, H, W}}));
+
+            s.device(ctx.GetEigenDevice<platform::CPUPlace>()) +=
+                alpha * r.square();


The same as above:

s += alpha * r.square();

Done!
Thanks!

qingqing01 · 2017-12-05T14:26:22Z

paddle/operators/lrn_op.cc

+
+    auto out_e = framework::EigenVector<T>::Flatten(*out);
+    out_e.device(ctx.GetEigenDevice<platform::CPUPlace>()) =
+        x_v * e_mid.reshape(Eigen::DSizes<int, 1>(e_mid.size())).pow(-beta);


The same as above.

Done!
Thanks!

qingqing01 · 2017-12-05T14:26:47Z

paddle/operators/lrn_op.cc

+  void operator()(const framework::ExecutionContext& ctx,
+                  const framework::Tensor* x, const framework::Tensor* out,
+                  const framework::Tensor* mid, framework::Tensor* x_g,
+                  const framework::Tensor* out_g, int N, int C, int H, int W,


For the input arguments, the same as above comments.

Done!
Thanks!

qingqing01 · 2017-12-05T14:31:58Z

paddle/operators/lrn_op.cu

+                    T alpha, T beta) {
+  int img_size = N * H * W;
+  int block_size = 1024;
+  int grid_size = (img_size + 1024 - 1) / 1024;


用block_size替代line 69中的1024.

int grid_size = (img_size + block_size - 1) / block_size;

Done!
Thanks!

qingqing01 · 2017-12-05T14:32:21Z

paddle/operators/lrn_op.cu

+
+  int input_size = N * H * W * C;
+  block_size = 1024;
+  grid_size = (input_size + 1024 - 1) / 1024;


同上，用block_size替代line 79中的1024.

Done!
Thanks!

qingqing01 · 2017-12-05T14:35:52Z

paddle/operators/lrn_op.cu

+      }
+      if (index >= size) {
+        accum -= in[(index - size) * step] * in[(index - size) * step];
+      }


line 41和line 44中，可以利用寄存器先保存global内存中的数据，这样可以避免多次访问globle内存：

if (index < C) { T val = in[index * step]; accum += val * val; } if (index >= size) { T val = in[index - size) * step]; accum -= val * val; }

qingqing01 · 2017-12-05T14:37:35Z

paddle/operators/lrn_op.cu

+
+  const auto& stream =
+      reinterpret_cast<const platform::CUDADeviceContext&>(ctx.device_context())
+          .stream();


同上，参考https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/lookup_table_op.cu#L139

Done!
Thanks!

qingqing01 · 2017-12-05T14:37:41Z

paddle/operators/lrn_op.cu

+  int img_size = N * H * W;
+
+  int block_size = 1024;
+  int grid_size = (img_size + 1024 - 1) / 1024;


Done!
Thanks!

qingqing01

LGTM.

gongweibao added 4 commits November 22, 2017 12:07

init

c30bfc6

forward compile ok

0f65e88

gpu forward ok

0bd0b69

backword ok

4715c53

gongweibao requested review from qingqing01 and reyoung November 24, 2017 08:32

rm not need

2834341

gongweibao changed the title ~~Add effient GPU implement~~ Add LRN efficient GPU implement. Nov 24, 2017

modify doc

745938c

gongweibao requested a review from hedaoyuan November 24, 2017 09:22

qingqing01 reviewed Dec 5, 2017

View reviewed changes

gongweibao added 2 commits December 6, 2017 01:54

fix by qingqing comments

75c62c4

fix by comments

b11d86d

qingqing01 approved these changes Dec 6, 2017

View reviewed changes

gongweibao merged commit c7e739f into PaddlePaddle:develop Dec 6, 2017

gongweibao deleted the lrngpu branch December 6, 2017 08:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LRN efficient GPU implement. #5894

Add LRN efficient GPU implement. #5894

gongweibao commented Nov 24, 2017

qingqing01 Dec 5, 2017

gongweibao Dec 6, 2017

qingqing01 Dec 5, 2017

gongweibao Dec 6, 2017

qingqing01 Dec 5, 2017

gongweibao Dec 6, 2017

qingqing01 Dec 5, 2017

gongweibao Dec 6, 2017

qingqing01 Dec 5, 2017

gongweibao Dec 6, 2017

qingqing01 Dec 5, 2017

gongweibao Dec 6, 2017

qingqing01 Dec 5, 2017

gongweibao Dec 6, 2017

qingqing01 Dec 5, 2017

qingqing01 Dec 5, 2017

gongweibao Dec 6, 2017

qingqing01 Dec 5, 2017

gongweibao Dec 6, 2017

qingqing01 left a comment

Add LRN efficient GPU implement. #5894

Add LRN efficient GPU implement. #5894

Conversation

gongweibao commented Nov 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment