-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add depthwise conv op gpu #7885
Conversation
KernelDepthwiseConvInputGrad<T><<<grid, threads, 0, context.stream()>>>( | ||
nthreads, output_grad_data, filter_data, batch_size, output_channels, | ||
output_height, output_width, input_channels, input_height, input_width, | ||
output_channels / input_channels, ksize_height, ksize_width, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_channels / input_channels
This is to say that input_channels
should be less than output_channels
and output_channels
can be divided by input_channels
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, i think i should add this check on the python side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of DepthwiseConvKernel should not depend on Python code. Because the interface of FLUID does not only support Python.
const int offset = | ||
((batch * input_channels + c_in) * input_height + h_in) * | ||
input_width + | ||
w_in; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const int h_in = -padding_height + h_out * stride_height + kh;
const int w_in = -padding_width + w_out * stride_width + kw;
const int offset = ((batch * input_channels + c_in) * input_height + h_in) * input_width + w_in;
These codes can be written more efficiently.
const int h_in_s = -padding_height + h_out * stride_height;
const int w_in_s = -padding_width + w_out * stride_width;
const int in_offset = ((batch * input_channels + c_in) * input_height) * input_width;
for (int kh = 0; kh < filter_height; ++kh) {
for (int kw = 0; kw < filter_width; ++kw) {
const int h_in = h_in_s + kh;
const int w_in = w_in_s + kw;
const int offset = in_offset + h_in * input_width + w_in;
value += (*weight) * input_data[offset];
++weight;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i will fix it ASAP.
paddle/operators/conv_op.h
Outdated
|
||
std::vector<int> strides = context.Attr<std::vector<int>>("strides"); | ||
std::vector<int> paddings = context.Attr<std::vector<int>>("paddings"); | ||
std::vector<int> dilations = context.Attr<std::vector<int>>("dilations"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't DepthwiseConv
support groups
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The groups value equals to the input channels num.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add the choice to the python code, it will execute the depthwise conv op when groups size equals to the input channels. But i don't add any check on the c++ side, because i directly use the conv op instead recreate one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add PADDLE_ENFORCE_EQ(....)
to check and add comments about that, the implementation of DepthwiseConvKernel
should not depend on Python code. Because the interface of FLUID does not only support Python.
paddle/operators/CMakeLists.txt
Outdated
op_library(conv_op SRCS conv_op.cc conv_op.cu.cc conv_cudnn_op.cu.cc DEPS | ||
vol2col depthwise_conv) | ||
|
||
# op_library(conv_op SRCS conv_op.cc conv_op.cu.cc conv_cudnn_op.cu.cc DEPS vol2col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
paddle/operators/conv_op.h
Outdated
|
||
std::vector<int> strides = context.Attr<std::vector<int>>("strides"); | ||
std::vector<int> paddings = context.Attr<std::vector<int>>("paddings"); | ||
std::vector<int> dilations = context.Attr<std::vector<int>>("dilations"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add PADDLE_ENFORCE_EQ(....)
to check and add comments about that, the implementation of DepthwiseConvKernel
should not depend on Python code. Because the interface of FLUID does not only support Python.
paddle/operators/conv_op.cu.cc
Outdated
REGISTER_OP_CUDA_KERNEL( | ||
depthwise_conv_grad, | ||
ops::DepthwiseConvGradKernel<paddle::platform::CUDADeviceContext, float>, | ||
ops::DepthwiseConvGradKernel<paddle::platform::CUDADeviceContext, double>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does depthwise_conv
need a cudnn
kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no depthwise cudnn kernel. If we specify the cudnn mode, we use the conv cudnn kernel.
KernelDepthwiseConvInputGrad<T><<<grid, threads, 0, context.stream()>>>( | ||
nthreads, output_grad_data, filter_data, batch_size, output_channels, | ||
output_height, output_width, input_channels, input_height, input_width, | ||
output_channels / input_channels, ksize_height, ksize_width, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of DepthwiseConvKernel should not depend on Python code. Because the interface of FLUID does not only support Python.
class DepthwiseConvFunctor { | ||
public: | ||
void operator()(const DeviceContext& context, const framework::Tensor& input, | ||
const framework::Tensor& filter, std::vector<int>& strides, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<int>& strides
==> const std::vector<int>& strides
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
++weight; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that line53~77 can be written more shortly and more efficiently.
Improving the efficiency by reducing the number of judgments.
e.g.
h_in_end = ...
w_in_end = ...
for (int kh = h_in_start; kh < h_in_end; ++kh) {
for (int kw = w_in_start; kw < w_in_end; ++kw) {
...
...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, we have to add extra complex judgment for filter data. This may outweigh the benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following code can work, and it is not complex.
h_end = h_in_start + filter_height > input_height? input_height:h_in_start + filter_height;
w_end = w_in_start + filter_width > input_width? input_width:w_in_start + filter_width;
h_start = h_in_start > 0? h_in_start:0;
w_start = w_in_start > 0? w_in_start:0;
input_data += in_offset;
for (int h_in = h_start; h_in < h_end; ++h_in) {
for (int w_in = w_start; w_in < w_end; ++w_in) {
const int offset = h_in * input_width + w_in;
value += weight[(h_in - h_start)*filter_width + w_in] * input_data[offset];
}
}
}
… add_depthwiseConv_op_gpu
@@ -0,0 +1,340 @@ | |||
/* Copyright (c) 2016 paddlepaddle Authors. All Rights Reserve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2016
==> 2018
Reserve
==> Reserved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other file seems same with mine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the file is created in 2018, the correction of copyright year should be 2018.
} | ||
++weight; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following code can work, and it is not complex.
h_end = h_in_start + filter_height > input_height? input_height:h_in_start + filter_height;
w_end = w_in_start + filter_width > input_width? input_width:w_in_start + filter_width;
h_start = h_in_start > 0? h_in_start:0;
w_start = w_in_start > 0? w_in_start:0;
input_data += in_offset;
for (int h_in = h_start; h_in < h_end; ++h_in) {
for (int w_in = w_start; w_in < w_end; ++w_in) {
const int offset = h_in * input_width + w_in;
value += weight[(h_in - h_start)*filter_width + w_in] * input_data[offset];
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
fix #7772
Benchmark of one forwardbackward