Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x86] Depthwise conv2d #6745

Merged
merged 14 commits into from
Sep 10, 2021
Merged

Conversation

lxwlaq
Copy link
Collaborator

@lxwlaq lxwlaq commented Aug 23, 2021

add depthwise 3×3s1p1 3×3s2p1 optimize

conv old new rate
1×32×112×112 stride=1 0.205 0.157 23.4%
1×64×112×112 stride=2 0.232 0.182 21.5%
1×128×56×56 stride=1 0.210 0.162 22.8%
1×128×56×56 stride=2 0.114 0.100 12.2%
model old new rate
MobileNetV1 14.69 12.34 15.9%
MobileNetV2 11.8 9.12 22.7%
MobileNetV3_large 18.82 17.83 5.3%
MobileNetV3_small 9.58 9.36 2.3%

@paddle-bot-old
Copy link

Thanks for your contribution!

@lxwlaq lxwlaq closed this Aug 23, 2021
@lxwlaq lxwlaq reopened this Aug 23, 2021
@@ -87,12 +87,16 @@ if (WITH_AVX AND AVX_FOUND)
math_library (interpolate AVX2 TRUE DEPS math_function)
math_library (power DEPS AVX2 TRUE DEPS avx_mathfuns)
math_library (rnn AVX2 TRUE)
math_library (conv_depthwise_direct AVX2 TRUE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加上性能优化后的数据,例如:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

__m128i mask = _mm_setr_epi32(0x80000000, 0x80000000, 0x80000000, 0);
if (j + 1 == col) {
__m256 rmaski_ = _mm256_loadu_ps(rmask_i);
i0 = _mm256_mul_ps(i0, rmaski_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以用_mm256_maskload_ps 实现,有效数据load

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

或者用_mm256_blend_ps实现有效数据选择
image

lite/backends/x86/math/conv_depthwise_3x3.cc Show resolved Hide resolved
0x80000000);
if (j + 1 == col) {
__m256 rmaski_ = _mm256_loadu_ps(rmask_i);
i0 = _mm256_mul_ps(i0, rmaski_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

} // namespace math
} // namespace x86
} // namespace lite
} // namespace paddl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加空行

int ow = o_dims[3];
int oc = o_dims[1];

lite::x86::math::conv_depthwise_direct(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是不是可以直接调用具体实现,减少嵌套调用。
如:if stride == 1
conv_depthwise_3x3s1_p1_direct(din,
dout,
num,
ch_out,
h_out,
w_out,
ch_in,
h_in,
w_in,
weights,
bias,
pad,
flag_bias,
act_param);

@lxwlaq lxwlaq closed this Aug 29, 2021
@lxwlaq lxwlaq reopened this Aug 29, 2021
@lxwlaq lxwlaq closed this Sep 2, 2021
@lxwlaq lxwlaq reopened this Sep 2, 2021
@lxwlaq lxwlaq closed this Sep 7, 2021
@lxwlaq lxwlaq reopened this Sep 7, 2021
@lxwlaq lxwlaq closed this Sep 7, 2021
@lxwlaq lxwlaq reopened this Sep 7, 2021
Copy link
Collaborator

@chenjiaoAngel chenjiaoAngel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenjiaoAngel chenjiaoAngel merged commit fa7ad7b into PaddlePaddle:develop Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants