[CustomOp] Polish custom api content for performance in dygraph #32209

chenwhql · 2021-04-12T09:37:35Z

PR types

Performance optimization

PR changes

Others

Describe

目前自定义Op不需要用户自己封装Python API，而是采用了自动生成的技术，而自定义op自动生产API的时候，内部调用仍然使用的是以前append_op的那一套，而不是core.ops，这一方面是为了动静兼容，另一方面也是因为在自定义op的体系中没有对应的core.ops API可以调用，由于append_op调用栈比较深，会导致API在动态图下的调用性能差一些，这个是动态图之前已知的问题

这个PR简化了自动生成API在动图下的Python调用栈，以提高自定义OP API在动态图下的执行性能，简要测试如下：

1. 测试条件

测试op：concat（device为CPU，dtype为float32，axis为1）
测试数据很小：两个小矩阵concat

np_inputs = [
    np.array([[1, 2, 3], [4, 5, 6]]),
    np.array([[11, 12, 13], [14, 15, 16]])
]

测试次数：100000次API调用时间求平均值
为什么选用concat？
- 没有特别的理由，目前单测中已有concat实现，可以直接使用
- 自定义op的concat的CPU实现和paddle内concat的实现是一致的，不像relu在内部是使用eigne实现的，op本身计算是否高效是由用户实现决定的，本次测试只关注自定义op引入的额外执行成本
为什么axis要为1?
- 因为paddle内部concat实现当axis为0时，走的是StridedNumelCopyWithAxis计算分支，与自定义op的C++实现不一致，确保op的C++计算实现一致是本测试的前提条件

2. 测试数据

动态图（单位s）

	原custom op	paddle api	本PR优化后custom op	提升时间	优化后差距
axis为Tensor	6.31e-05	3.45e-05	4.31e-05	2.0e-05	0.86e-05
axis为Attribute	6.90e-05	2.91e-05	4.47e-05	2.43e-05	1.56e-05

注：这里百分比差距意义不大，绝对差距是主要关注的点，因为绝对差距是额外引入的开销，对其他API也类似，如果API本身计算量很大，百分比差距会减小

本PR优化后，仍有的时间差（大概在10us左右）基本是自定义Op机制本身引入的开销：

在自定义op计算量比较大的时候，基本可以忽略，但若计算量比较小，这个开销仍然是比较重的
自定义op为了适配原生的op体系，主要是将原生op的核心逻辑抽出来，由用户实现，而内部仍然使用的是原生op的执行逻辑，通过编译时模板推导，以及运行时参数解析和封装，将两套写法衔接起来。这个会引入一些非计算的衔接开销
axis为Tensor和Attribute时的差别说明了Attribute的获取和转换成本相比Tensor是比较高的，这包括ctx.Attr和boost::any_cast，any_cast的成本可能是比较高的

paddle-bot-old · 2021-04-12T09:38:22Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Aurelius84

LGTM

polish custom api content for performence

cacbb61

chenwhql requested review from Aurelius84, zhwesky2010, phlrain, lanxianghit and JiabinYang April 12, 2021 11:16

Aurelius84 approved these changes Apr 12, 2021

View reviewed changes

chenwhql merged commit 0624ea5 into PaddlePaddle:develop Apr 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CustomOp] Polish custom api content for performance in dygraph #32209

[CustomOp] Polish custom api content for performance in dygraph #32209

chenwhql commented Apr 12, 2021 •

edited

Loading

paddle-bot-old bot commented Apr 12, 2021

Aurelius84 left a comment

[CustomOp] Polish custom api content for performance in dygraph #32209

[CustomOp] Polish custom api content for performance in dygraph #32209

Conversation

chenwhql commented Apr 12, 2021 • edited Loading

PR types

PR changes

Describe

1. 测试条件

2. 测试数据

paddle-bot-old bot commented Apr 12, 2021

Aurelius84 left a comment

Choose a reason for hiding this comment

chenwhql commented Apr 12, 2021 •

edited

Loading