Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FC & Softmax #6560

Merged
merged 12 commits into from
Jul 29, 2021
Merged

FC & Softmax #6560

merged 12 commits into from
Jul 29, 2021

Conversation

zhaoyang-star
Copy link
Collaborator

@zhaoyang-star zhaoyang-star commented Jul 26, 2021

【问题】

  • 已有的fckernel 是基于cl::Buffer实现,性能不佳
  • 已有的softmax在处理二维tensor时,性能不佳,原因是并行度很低,比如维度为 1x1000 的 tensor,axis=1,只分配了一个线程来计算

【本PR工作】

  • 优化fc,input/output/bias 使用cl::Image2d存储,weight 使用cl::Buffer存储,且 weight 的读取方式是half16,具体参见 [OpenCL][Kernel] Use FC replace conv1x1 #6365 ;对应单测支持 fp32/fp16 两种精度验证
  • 优化softmax,针对处理二维tensor时性能不佳的问题,调整线程分配方式为对 axis 轴所在的数据以32进行分块处理,因此使用了 local memory,核心思想是并行 reduce;同时为了高效处理channel非4整除情况,使用mask来避免使用if/else判断

【效果】
MobileNetV1 模型中有一个fc和一个softmax,在包含 mali 和 adreno gpu 6 个设备上测试 kernel 耗时,如下表(耗时单位 ms)。fc可提速 1 ~ 3 倍,softmax可提速 44% ~ 302%
image

单独在 845 上测试不同N值下的 FC 性能:
image

【TODO】
由于这两个 kernel 的输出都是 2 维的,当对其输出 tensor 的维度扩充为 4 维时,不是按照 opencl converter 中定义的对高维度pad 1,而是对低维度 pad 1,因此对 precision profile 会有影响,待解决此处。后续计划统一将 opencl converter 改为对低维度 pad 1。

@zhaoyang-star zhaoyang-star marked this pull request as ready for review July 28, 2021 13:35
Copy link
Collaborator

@daming5432 daming5432 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhaoyang-star zhaoyang-star merged commit 3dbaebd into PaddlePaddle:develop Jul 29, 2021
@zhaoyang-star zhaoyang-star deleted the tune_fc branch July 29, 2021 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants