Add depthwise conv op gpu #7885

NHZlX · 2018-01-25T13:37:37Z

Benchmark of one forwardbackward

batch	ours	cudnn v7	cudnn v5	cuBlas
1	0.076s	0.97s	0.91s	0.44s
20	0.62s	1.60s	1.61s	7.63s
40	1.21s	1.82s	1.89s	15s
60	2.06s	2.41s	2.66s	23s
80	2.69s	3.25	3.91s	30s

… add_depthwiseConv_op_gpu

chengduoZH · 2018-01-29T05:15:42Z

paddle/operators/math/depthwise_conv.cu

+    KernelDepthwiseConvInputGrad<T><<<grid, threads, 0, context.stream()>>>(
+        nthreads, output_grad_data, filter_data, batch_size, output_channels,
+        output_height, output_width, input_channels, input_height, input_width,
+        output_channels / input_channels, ksize_height, ksize_width,


output_channels / input_channels

This is to say that input_channels should be less than output_channels and output_channels can be divided by input_channels, right?

That's right, i think i should add this check on the python side.

The implementation of DepthwiseConvKernel should not depend on Python code. Because the interface of FLUID does not only support Python.

chengduoZH · 2018-01-29T05:22:54Z

paddle/operators/math/depthwise_conv.cu

+          const int offset =
+              ((batch * input_channels + c_in) * input_height + h_in) *
+                  input_width +
+              w_in;


const int h_in = -padding_height + h_out * stride_height + kh; const int w_in = -padding_width + w_out * stride_width + kw; const int offset = ((batch * input_channels + c_in) * input_height + h_in) * input_width + w_in;

These codes can be written more efficiently.

const int h_in_s = -padding_height + h_out * stride_height; const int w_in_s = -padding_width + w_out * stride_width; const int in_offset = ((batch * input_channels + c_in) * input_height) * input_width; for (int kh = 0; kh < filter_height; ++kh) { for (int kw = 0; kw < filter_width; ++kw) { const int h_in = h_in_s + kh; const int w_in = w_in_s + kw; const int offset = in_offset + h_in * input_width + w_in; value += (*weight) * input_data[offset]; ++weight; } }

Yeah, i will fix it ASAP.

chengduoZH · 2018-01-29T05:28:32Z

paddle/operators/conv_op.h

+
+    std::vector<int> strides = context.Attr<std::vector<int>>("strides");
+    std::vector<int> paddings = context.Attr<std::vector<int>>("paddings");
+    std::vector<int> dilations = context.Attr<std::vector<int>>("dilations");


Doesn't DepthwiseConv support groups？

The groups value equals to the input channels num.

I add the choice to the python code, it will execute the depthwise conv op when groups size equals to the input channels. But i don't add any check on the c++ side, because i directly use the conv op instead recreate one.

You should add PADDLE_ENFORCE_EQ(....) to check and add comments about that, the implementation of DepthwiseConvKernel should not depend on Python code. Because the interface of FLUID does not only support Python.

chengduoZH · 2018-01-29T05:28:58Z

paddle/operators/CMakeLists.txt

+op_library(conv_op SRCS conv_op.cc conv_op.cu.cc conv_cudnn_op.cu.cc DEPS
+    vol2col depthwise_conv)
+
+# op_library(conv_op SRCS conv_op.cc conv_op.cu.cc conv_cudnn_op.cu.cc DEPS vol2col)


This line can be removed.

chengduoZH · 2018-01-31T02:42:27Z

paddle/operators/conv_op.h

+
+    std::vector<int> strides = context.Attr<std::vector<int>>("strides");
+    std::vector<int> paddings = context.Attr<std::vector<int>>("paddings");
+    std::vector<int> dilations = context.Attr<std::vector<int>>("dilations");


You should add PADDLE_ENFORCE_EQ(....) to check and add comments about that, the implementation of DepthwiseConvKernel should not depend on Python code. Because the interface of FLUID does not only support Python.

chengduoZH · 2018-01-31T02:44:04Z

paddle/operators/conv_op.cu.cc

+REGISTER_OP_CUDA_KERNEL(
+    depthwise_conv_grad,
+    ops::DepthwiseConvGradKernel<paddle::platform::CUDADeviceContext, float>,
+    ops::DepthwiseConvGradKernel<paddle::platform::CUDADeviceContext, double>);


Does depthwise_conv need a cudnn kernel?

There is no depthwise cudnn kernel. If we specify the cudnn mode, we use the conv cudnn kernel.

chengduoZH · 2018-01-31T02:53:16Z

paddle/operators/math/depthwise_conv.cu

+    KernelDepthwiseConvInputGrad<T><<<grid, threads, 0, context.stream()>>>(
+        nthreads, output_grad_data, filter_data, batch_size, output_channels,
+        output_height, output_width, input_channels, input_height, input_width,
+        output_channels / input_channels, ksize_height, ksize_width,


The implementation of DepthwiseConvKernel should not depend on Python code. Because the interface of FLUID does not only support Python.

chengduoZH · 2018-01-31T02:54:46Z

paddle/operators/math/depthwise_conv.h

+class DepthwiseConvFunctor {
+ public:
+  void operator()(const DeviceContext& context, const framework::Tensor& input,
+                  const framework::Tensor& filter, std::vector<int>& strides,


std::vector<int>& strides ==> const std::vector<int>& strides

chengduoZH · 2018-01-31T03:07:52Z

paddle/operators/math/depthwise_conv.cu

+          }
+          ++weight;
+        }
+      }


It seems that line53~77 can be written more shortly and more efficiently.
Improving the efficiency by reducing the number of judgments.

e.g.

h_in_end = ... w_in_end = ... for (int kh = h_in_start; kh < h_in_end; ++kh) { for (int kw = w_in_start; kw < w_in_end; ++kw) { ... ... } }

If so, we have to add extra complex judgment for filter data. This may outweigh the benefits.

I think the following code can work, and it is not complex.

h_end = h_in_start + filter_height > input_height? input_height:h_in_start + filter_height; w_end = w_in_start + filter_width > input_width? input_width:w_in_start + filter_width; h_start = h_in_start > 0? h_in_start:0; w_start = w_in_start > 0? w_in_start:0; input_data += in_offset; for (int h_in = h_start; h_in < h_end; ++h_in) { for (int w_in = w_start; w_in < w_end; ++w_in) { const int offset = h_in * input_width + w_in; value += weight[(h_in - h_start)*filter_width + w_in] * input_data[offset]; } } }

… add_depthwiseConv_op_gpu

chengduoZH · 2018-02-02T02:26:52Z

paddle/operators/math/depthwise_conv.cu

@@ -0,0 +1,340 @@
+/* Copyright (c) 2016 paddlepaddle Authors. All Rights Reserve.


2016 ==> 2018
Reserve ==> Reserved

The other file seems same with mine.

If the file is created in 2018, the correction of copyright year should be 2018.

chengduoZH · 2018-02-02T02:44:41Z

paddle/operators/math/depthwise_conv.cu

+          }
+          ++weight;
+        }
+      }


I think the following code can work, and it is not complex.

h_end = h_in_start + filter_height > input_height? input_height:h_in_start + filter_height; w_end = w_in_start + filter_width > input_width? input_width:w_in_start + filter_width; h_start = h_in_start > 0? h_in_start:0; w_start = w_in_start > 0? w_in_start:0; input_data += in_offset; for (int h_in = h_start; h_in < h_end; ++h_in) { for (int w_in = w_start; w_in < w_end; ++w_in) { const int offset = h_in * input_width + w_in; value += weight[(h_in - h_start)*filter_width + w_in] * input_data[offset]; } } }

… add_depthwiseConv_op_gpu

chengduoZH

LGTM

NHZlX added 4 commits January 22, 2018 21:09

add depthwise conv forward

3772d27

../../../../../paddle/api

06db703

add depthwise gpu forward, backward, test, interface

aece290

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b5ea048

… add_depthwiseConv_op_gpu

NHZlX requested review from qingqing01, chengduoZH and hedaoyuan January 25, 2018 13:37

chengduoZH reviewed Jan 29, 2018

View reviewed changes

More efficient, add check on python side

6e17bab

chengduoZH reviewed Jan 31, 2018

View reviewed changes

NHZlX added 2 commits February 1, 2018 23:46

fix comments

84ded49

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

fc9b2b9

… add_depthwiseConv_op_gpu

chengduoZH reviewed Feb 2, 2018

View reviewed changes

qingqing01 mentioned this pull request Feb 2, 2018

The TODO lists for MobileNet-SSD model. #7488

Closed

25 tasks

NHZlX added 2 commits February 2, 2018 18:28

rename op to depthwise_conv2d, more efficient

2ffa3a8

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

3074ae7

… add_depthwiseConv_op_gpu

chengduoZH approved these changes Feb 2, 2018

View reviewed changes

NHZlX merged commit d059951 into PaddlePaddle:develop Feb 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add depthwise conv op gpu #7885

Add depthwise conv op gpu #7885

NHZlX commented Jan 25, 2018 •

edited

Loading

chengduoZH Jan 29, 2018

NHZlX Jan 29, 2018

chengduoZH Jan 31, 2018

chengduoZH Jan 29, 2018

NHZlX Jan 29, 2018

chengduoZH Jan 29, 2018

NHZlX Jan 29, 2018

NHZlX Jan 29, 2018

chengduoZH Jan 31, 2018

chengduoZH Jan 29, 2018

NHZlX Jan 29, 2018

chengduoZH Jan 31, 2018

chengduoZH Jan 31, 2018

NHZlX Feb 1, 2018

chengduoZH Jan 31, 2018

chengduoZH Jan 31, 2018

NHZlX Feb 1, 2018

chengduoZH Jan 31, 2018

NHZlX Feb 1, 2018

chengduoZH Feb 2, 2018

chengduoZH Feb 2, 2018

NHZlX Feb 2, 2018

chengduoZH Feb 2, 2018

chengduoZH Feb 2, 2018

chengduoZH left a comment

		@@ -0,0 +1,340 @@
		/* Copyright (c) 2016 paddlepaddle Authors. All Rights Reserve.

Add depthwise conv op gpu #7885

Add depthwise conv op gpu #7885

Conversation

NHZlX commented Jan 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chengduoZH left a comment

Choose a reason for hiding this comment

NHZlX commented Jan 25, 2018 •

edited

Loading