Support Div and FloorDiv functor in elementwise system #33053

JamesLim-sy · 2021-05-21T17:40:49Z

PR types

Performance optimization

PR changes

OPs

Describe

Basing on new elementwise + broadcast system support binary functors below :
Div
Floor_div
The performance variation is below:

The explicit comparison of floor_div is below:

x.shape	y.shape	data type	Paddle dev /us	Paddle Opti /us	Pytorch /us	Perf diff (with respect to pytorch)
[16, 128, 8]	[16, 128, 8]	int64	2.0700	1.814	2.006	slow 3.19% -> fast 9.57%
[300, 128, 100]	[300, 128, 100]	int32	57.514	56.785	56.587	slow 1.64% -> slow 0.35%
[[300, 128, 100]	[1]	int64	78.224	76.359	76.550	slow 2.19 -> fast 0.25%

As can be seen in the table, the time cost of most of test cases reflect the great improvment in logical ops after optimization of Elementwise and Broadcast op.

paddle-bot-old · 2021-05-21T17:40:52Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…Paddle into Adding_div_functors

…time

paddle-bot-old · 2021-05-31T02:34:34Z

Sorry to inform you that bc2b805's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Xreki · 2021-06-10T01:22:20Z

paddle/fluid/operators/elementwise/elementwise_floordiv_op.cu

+    PADDLE_ENFORCE(argv[1] != 0,
+                   "InvalidArgument: divide by zero "
+                   "encountered in floor-divide ops, please check.\n");
+    return argv[0] / argv[1];


不用调trunc函数了吗？那和div计算就没区别了啊

Floor_div 处理的数据类型仅有 int32和 int64，返回值如果也是 int32 或 int64的话，会有自动截断操作。本地也搭建了测试脚本发现两者的计算结果相同。

import numpy as np import paddle as pd npx = np.array([9, -11, -8, 7]) npy = np.array([2, 3, 3, 2]) x = pd.to_tensor(npx, dtype='int32') y = pd.to_tensor(npy, dtype='int32') z2 = pd.divide(x, y) z1 = pd.floor_divide(x, y) result = pd.subtract(z1, z2) # result = [0, 0, 0, 0]

目前对比的结果显示，针对 floor_div 的计算，Paddle和Pytorch的计算结果是一样的，具体表现是在针对 floor_div(-8, 3) 这种计算时，
paddle和 pytorch计算的规则生成的结果是 -2，向中心点floor；
Numpy 和 tensorflow的结果是 -3，向x负半轴方向floor。

感觉可以再讨论一下是否需要修改Paddle中floor_div的计算规则。

找op负责人确认下，或者当前pr先保留trunc。

Xreki · 2021-06-10T01:25:53Z

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h

+  if (numel / (sm_nums << 1) < ELEMENTWISE_BLOCK_SIZE) {
+    threads = platform::RoundToPowerOfTwo(numel / (sm_nums << 1));
+  } else if (numel / (sm_nums << 2) < ELEMENTWISE_BLOCK_SIZE) {
+    threads = platform::RoundToPowerOfTwo(numel / (sm_nums << 1));


L41和L43写的一样？另外这2行分别加个注释说明下。

L43出现笔误，下个commit修改掉。

Xreki · 2021-06-10T01:26:19Z

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h

+* than number of SMs. Hence, SMs is took into consideration within this
+* function to determine the right number of threads per block,
+*/
+template <typename Enable = void>


为什么要用模板呢？

这里用的是偷懒的做法，增加模板或者函数头增加static 关键字，防止了编译过程出现函数重定义的问题。实际修改方法应该是将新开一个.cu文件，然后把函数放进.cu里面完成实例化。

不应该啊，文件开头不是有#pragma once？

查了一下stackoverflow 这个问题不能通过 #pragma once 解决。总结出来的问题有以下两种：
（1）仍旧另起一个.cu文件实例化函数体；
（2）加inline 或者 static 关键字；
偷了个懒，选择加了个inline 关键字解决。

Xreki · 2021-06-10T01:28:38Z

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h

  if (address % vec4 == 0) {
-    return 4;
+    return std::min(vec4, valid_vec_size);


这里，float16的向量化长度可能返回8，但是你没有检查地址是否满足vec8对齐的要求。

这点确实考虑漏掉了，按照建议修改，下个Commit提交。

Xreki · 2021-06-10T01:34:47Z

paddle/fluid/operators/elementwise/elementwise_op_impl.cu.h

 #pragma unroll
    for (int j = 0; j < ET; ++j) {
      ins[j] = ins_ptr[j][i];
    }
-    out_ptr[i] = func(ins);
+    out_vec.val[i] = func(ins);


我们的AlignedVector或许可以改成继承Array类，定义AlignedArray。

Paddle/paddle/fluid/framework/array.h

Lines 24 to 35 in 42c1297

template <typename T, size_t N>

class Array {

public:

static constexpr size_t kSize = N;

HOSTDEVICE inline Array() {}

template <typename... Args>

HOSTDEVICE inline explicit Array(const T &val, Args... args) {

static_assert(N == sizeof...(Args) + 1, "Invalid argument");

UnrollVarArgsAssign<T>::Run(data_, val, args...);

}

Xreki

LGTM

chenwhql

lgtm for paddle enforce

First_Commit.

c60c4e0

JamesLim-sy added 4 commits May 21, 2021 17:52

Fixs

8a002d8

Fix bugs

9e89caf

Change the assert codes

a495b1a

Update the flood_div op

629ceaa

JamesLim-sy changed the title ~~Support Div binary functors in elementwise system~~ Support Div Floor_div binary functors in elementwise system May 23, 2021

JamesLim-sy added 10 commits May 23, 2021 09:09

Update the flood_div op

d42eeb0

Update the flood_div op

62b221a

Update the flood_div op, change back into HOSTDEVICE.

67b034b

Merge branch 'Adding_div_functors' of https://github.com/JamesLim-sy/…

937a297

…Paddle into Adding_div_functors

Fix written bugs.

2d371bc

Fix bugs

bc2b805

Fisrt commit

9d46543

Trigger of rerun

74e4179

To avoid spartial specification bugs which happened in PR-CI-ROCM

656ac99

Avoid kUnary instantiation of LaunchElementwiseCudaKernel at compile …

585566f

…time

refine warpper of broadcast and add cuda op

d9c70ec

JamesLim-sy changed the title ~~Support Div Floor_div binary functors in elementwise system~~ Support Pow Div Floor_div functors in elementwise system Jun 1, 2021

JamesLim-sy changed the title ~~Support Pow Div Floor_div functors in elementwise system~~ Support Div Floor_div functors in elementwise system Jun 1, 2021

Merge branch 'Fix_bugs_in_elementwise_warpprer' into Adding_div_functors

20f9e36

Fix bugs

fb49a21

revert the changes in host exe codes in .h files

e2692ee

JamesLim-sy force-pushed the Adding_div_functors branch from 5e62170 to e2692ee Compare June 1, 2021 12:22

JamesLim-sy force-pushed the Adding_div_functors branch from d610983 to 3b5c6b2 Compare June 5, 2021 08:24

Merge branch 'develop' into Adding_div_functors

2f72406

JamesLim-sy force-pushed the Adding_div_functors branch from 3b5c6b2 to 2f72406 Compare June 5, 2021 08:28

Only support div currently.

b7a41e9

JamesLim-sy changed the title ~~Support Div Floor_div functors in elementwise system~~ Support Div functor in elementwise system Jun 5, 2021

JamesLim-sy added 3 commits June 8, 2021 07:16

refine the vectorized_load length function

8f26886

refine the vectorized_load length function

7b9c9a3

refine vectorized load determination function

4f5b5c1

JamesLim-sy changed the title ~~Support Div functor in elementwise system~~ Support Div and FloorDiv functor in elementwise system Jun 8, 2021

JamesLim-sy added 2 commits June 9, 2021 06:18

refine codes

5aaf95d

refine codes

f7d35b7

refine codes

da1cd7f

Xreki reviewed Jun 10, 2021

View reviewed changes

refine threads config.

85d954c

JamesLim-sy force-pushed the Adding_div_functors branch from 3a107ae to 85d954c Compare June 10, 2021 13:46

fix multiple errors with inline syntax

13698a6

Xreki approved these changes Jun 11, 2021

View reviewed changes

chenwhql approved these changes Jun 11, 2021

View reviewed changes

Xreki merged commit fcd93b3 into PaddlePaddle:develop Jun 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Div and FloorDiv functor in elementwise system #33053

Support Div and FloorDiv functor in elementwise system #33053

JamesLim-sy commented May 21, 2021 •

edited

Loading

paddle-bot-old bot commented May 21, 2021

paddle-bot-old bot commented May 31, 2021

Xreki Jun 10, 2021

JamesLim-sy Jun 10, 2021 •

edited

Loading

Xreki Jun 10, 2021

Xreki Jun 10, 2021

JamesLim-sy Jun 10, 2021

Xreki Jun 10, 2021

JamesLim-sy Jun 10, 2021

Xreki Jun 10, 2021

JamesLim-sy Jun 11, 2021

Xreki Jun 10, 2021

JamesLim-sy Jun 10, 2021

Xreki Jun 10, 2021

Xreki left a comment

chenwhql left a comment

	template <typename T, size_t N>
	class Array {
	public:
	static constexpr size_t kSize = N;

	HOSTDEVICE inline Array() {}

	template <typename... Args>
	HOSTDEVICE inline explicit Array(const T &val, Args... args) {
	static_assert(N == sizeof...(Args) + 1, "Invalid argument");
	UnrollVarArgsAssign<T>::Run(data_, val, args...);
	}

Support Div and FloorDiv functor in elementwise system #33053

Support Div and FloorDiv functor in elementwise system #33053

Conversation

JamesLim-sy commented May 21, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented May 21, 2021

paddle-bot-old bot commented May 31, 2021

Choose a reason for hiding this comment

JamesLim-sy Jun 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

chenwhql left a comment

Choose a reason for hiding this comment

JamesLim-sy commented May 21, 2021 •

edited

Loading

JamesLim-sy Jun 10, 2021 •

edited

Loading