【Hackathon No.52】为 Paddle dist 算子实现 float16 数据类型支持 #50915

jinyouzhi · 2023-02-25T19:29:49Z

PR types

New features

PR changes

OPs

Description

任务：#50658 (comment)

中文文档： PaddlePaddle/docs#5740

OP Performance:

OP	shape	p	fp32	fp16
dist_forward	[1000,1000]	2	0.05938422923185387	0.050949320501210746
dist_backward	[1000,1000]	2	0.11299653929107042	0.08545359786675902
dist_forward	[1000,1000]	inf	0.0472954341343471	0.044048319057542445
dist_backward	[1000,1000]	inf	0.09125203502421475	0.08334651285288286
dist_forward	[1000,1000]	0	0.04742607778432417	0.045565683014538824
dist_backward	[1000,1000]	0	0.08797159000318877	0.08200139415507414

[used AI Studio]

paddle-bot · 2023-02-25T19:29:53Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

CLAassistant · 2023-02-25T20:47:02Z

All committers have signed the CLA.

paddle/phi/kernels/cpu/p_norm_kernel.cc

paddle/phi/kernels/dist_grad_kernel.cc

paddle/phi/kernels/dist_kernel.cc

python/paddle/tensor/linalg.py

jinyouzhi · 2023-03-20T07:49:45Z

@zhangting2020 大部分CI都过了，能再review一下吗

jinyouzhi · 2023-03-22T08:48:34Z

@zhangting2020 补充了性能数据

paddle/phi/kernels/funcs/math_cuda_utils.h

zhangting2020 · 2023-03-22T11:49:52Z

paddle/phi/kernels/funcs/math_cuda_utils.h

+    val_ret += __shfl_xor(val_ret, mask, warpSize);
+#endif
+  return static_cast<phi::dtype::float16>(val_ret);
+}


这里之前的修改版本，调用原始的实现不是可以正常编译通过和运行吗？
区别主要是else分支转成了fp32？这种场景不需要在算子层面去处理

这里原有实现的编译不过的原因是传入的是phi::dtype::float16，而cuda的函数参数半精度是__half，所以做了一个模板特化处理fp16。这块不太清楚fp16和cuda的half在框架里面是怎么衔接的。

为了支持fp16跑通是可以这么改，但是从算子的计算精度上去有更多考虑：

对于reduce sum这种运算fp16下容易损失精度，都是需要保持计算精度为fp32，输入输出fp16的。你需要从调用它的ReduceSumWithSubtract去看，到运行到这个函数时，输入的类型已经不应该是float16了。可能并不需要增加float16的支持。

如果你希望将这个函数写的更通用支持float16，那可以在原始的接口上稍作修改：

可以在这个文件中 paddle/phi/backends/gpu/cuda/cuda_device_function.h找到CudaShuffleDownSync的接口，这是更推荐的写法。

考虑到精度，inf/-inf path 因为是 max/min 不会有精度问题保留fp16写法，对 -inf<p<inf 的情况使用float32计算（写得可能有点 naive，求comment 😝

math_cuda_utils: 改成调用CudaShuffleDownSync兼容fp16

@zhangting2020 改完了，请问当前这个方案可以吗～

paddle/phi/kernels/funcs/math_cuda_utils.h

jinyouzhi · 2023-04-27T08:29:18Z

从历史记录看，性能还是有回退

回退不在dist，主要是interp_trilinear和histogram回退了，这两个都不依赖dist，可能是paddle/phi/kernels/funcs/math_cuda_utils.h修改的副作用。
我先rebase一次测一下，另外撤销对math_cuda_utils的修改改一版。

…dle#50915)" This reverts commit 9c40653.

…53527) This reverts commit 9c40653.

paddle-bot bot added contributor External developers status: proposed labels Feb 25, 2023

jinyouzhi force-pushed the fp16/dist branch from 1e03823 to 902a72b Compare February 25, 2023 20:49

luotao1 assigned luotao1, zhangting2020 and cloud2009 Feb 27, 2023

zhangting2020 reviewed Mar 2, 2023

View reviewed changes

paddle/phi/kernels/cpu/p_norm_kernel.cc Outdated Show resolved Hide resolved

paddle/phi/kernels/dist_grad_kernel.cc Outdated Show resolved Hide resolved

paddle/phi/kernels/dist_kernel.cc Outdated Show resolved Hide resolved

python/paddle/tensor/linalg.py Outdated Show resolved Hide resolved

jinyouzhi force-pushed the fp16/dist branch 2 times, most recently from b1ee7df to 9edcfa2 Compare March 2, 2023 20:33

luotao1 assigned Ligoml Mar 6, 2023

jinyouzhi marked this pull request as draft March 7, 2023 17:20

jinyouzhi force-pushed the fp16/dist branch 2 times, most recently from e60a97c to 8ff0da6 Compare March 13, 2023 07:23

jinyouzhi force-pushed the fp16/dist branch 3 times, most recently from 1ec1d67 to f7d2bef Compare March 19, 2023 12:36

jinyouzhi marked this pull request as ready for review March 19, 2023 16:02

jinyouzhi requested a review from zhangting2020 March 19, 2023 19:35

jinyouzhi mentioned this pull request Mar 19, 2023

【PaddlePaddle Hackathon 第四期】任务总览 #50629

Closed

Ligoml mentioned this pull request Mar 20, 2023

【PaddlePaddle Hackathon 第四期】任务总览 #51281

Closed

zhangting2020 reviewed Mar 22, 2023

View reviewed changes

jinyouzhi requested a review from zhangting2020 March 23, 2023 15:13

jinyouzhi force-pushed the fp16/dist branch 4 times, most recently from f888d7c to 4922ad6 Compare March 27, 2023 17:51

jinyouzhi added 17 commits April 27, 2023 16:26

fix pow related funcs

8dc9f98

fix rocm

4a3afbe

add dist grad register fp16

471b91e

fix error

df42d26

fix error

426c0a0

call std::max/min

6b9a6ee

refactor WarpReduce* to support fp16/bf16

6e5325a

adopt fp32 sum/pow on p!=inf path for better acc

78d62fa

fix interface mistakes

2b14195

fix interface mistakes

c134c11

fix interface mistakes

23fb324

fix min/max val

367bdba

fix acc

81f2685

modify the p!=inf

7ad62a9

refactor p=0 p>0 by RuduceSumWithSubtract + ReduceKernel

f897737

fix errs

88afbf0

remove redundancy

6cc08db

jinyouzhi force-pushed the fp16/dist branch from b5c62bc to 6cc08db Compare April 27, 2023 08:26

zhangting2020 approved these changes Apr 28, 2023

View reviewed changes

luotao1 merged commit 9c40653 into PaddlePaddle:develop Apr 28, 2023

This was referenced Apr 28, 2023

【Hackathon No.52】为 Paddle dist 算子实现 float16 数据类型支持 #53294

Closed

【Hackathon 4 No52】add fp16 for dist #51669

Closed

jinyouzhi deleted the fp16/dist branch April 28, 2023 10:14

EmmonsCurse mentioned this pull request May 5, 2023

受到 pr_50915 合入影响，AFQMC_base, AFQMC_PTQ_1 模型在 develop 分支多环境下执行 trt 推理精度下降 #53525

Closed

jinyouzhi added a commit to jinyouzhi/Paddle that referenced this pull request May 5, 2023

Revert "【Hackathon No.52】为 Paddle dist 算子实现 float16 数据类型支持 (PaddlePad…

d6ba487

…dle#50915)" This reverts commit 9c40653.

jinyouzhi mentioned this pull request May 5, 2023

Revert "【Hackathon No.52】为 Paddle dist 算子实现 float16 数据类型支持" #53527

Merged

luotao1 pushed a commit that referenced this pull request May 5, 2023

Revert "【Hackathon No.52】为 Paddle dist 算子实现 float16 数据类型支持 (#50915)" (#…

d463f8e

…53527) This reverts commit 9c40653.

zhangting2020 mentioned this pull request Jun 26, 2023

数据类型扩展任务 #54871

Closed

jinyouzhi mentioned this pull request Aug 11, 2023

[dtype] add fp16 support for dist_kernel #56184

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon No.52】为 Paddle dist 算子实现 float16 数据类型支持 #50915

【Hackathon No.52】为 Paddle dist 算子实现 float16 数据类型支持 #50915

jinyouzhi commented Feb 25, 2023 •

edited

Loading

paddle-bot bot commented Feb 25, 2023

CLAassistant commented Feb 25, 2023 •

edited

Loading

jinyouzhi commented Mar 20, 2023

jinyouzhi commented Mar 22, 2023

zhangting2020 Mar 22, 2023

jinyouzhi Mar 22, 2023

zhangting2020 Mar 24, 2023

jinyouzhi Mar 31, 2023

jinyouzhi Apr 7, 2023

jinyouzhi commented Apr 27, 2023

【Hackathon No.52】为 Paddle dist 算子实现 float16 数据类型支持 #50915

【Hackathon No.52】为 Paddle dist 算子实现 float16 数据类型支持 #50915

Conversation

jinyouzhi commented Feb 25, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Feb 25, 2023

CLAassistant commented Feb 25, 2023 • edited Loading

jinyouzhi commented Mar 20, 2023

jinyouzhi commented Mar 22, 2023

zhangting2020 Mar 22, 2023

Choose a reason for hiding this comment

jinyouzhi Mar 22, 2023

Choose a reason for hiding this comment

zhangting2020 Mar 24, 2023

Choose a reason for hiding this comment

jinyouzhi Mar 31, 2023

Choose a reason for hiding this comment

jinyouzhi Apr 7, 2023

Choose a reason for hiding this comment

jinyouzhi commented Apr 27, 2023

jinyouzhi commented Feb 25, 2023 •

edited

Loading

CLAassistant commented Feb 25, 2023 •

edited

Loading