Optimize the perf of top_k when k is too large #40941

ZzSean · 2022-03-25T06:41:34Z

PR types

Performance optimization

PR changes

OPs

Describe

Optimize the perf of top_k when k is too large

开发环境：

设备：V100-16G
环境：CUDA10.1，cuDNN 7

优化方法：

采用基数排序的方法，从高位到低位，两个比特位为一组进行比较，每个slice由一个block参与，使用warp级别的intrinsic和shared memory进行个数的统计；
核心思想是选出第k大（小）的值，将大于（小于）该值的所有元素选出；
若该值不唯一，则根据index从小到大的顺序选出；
选出top k个值后，再使用cub对这k个值进行全排序。

config	pytorch(ms)	paddle优化前(ms)	对比	paddle优化后(ms)	对比	加速比
shape[136480],k=5000,fp32	1.12439	20.76580	差于 (17.47x)	0.86577	优于 (19.70%)	23.98x
shape[104903],k=5000,fp32	0.91584	17.57050	差于 (18.19x)	0.79751	打平 (2.29%)	22.03x
shape[133725],k=5000,fp32	1.10155	20.12270	差于 (17.27x)	0.86220	优于 (14.36%)	23.34x

paddle-bot-old · 2022-03-25T06:42:43Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Zjq9409 · 2022-03-28T02:52:55Z

paddle/phi/kernels/gpu/top_k_kernel.cu

+          Copy(dev_ctx, sorted_output, out->place(), false, out);
+          return;
+        } else {
+          LOG(INFO) << "TopKOP: Some errors happened when use cub sorting, use "


这里使用LOG(INFO)，但是后面报错信息有 errors，是不是使用LOG(ERROR)更适合？或者可以使用paddle的报错函数：phi::errors::XXX

这个并不是说走到这个分支就要退出，而是可以选择后续的分支继续完成计算，所以用errors不太合适

使用VLOG

Zjq9409 · 2022-03-28T02:53:08Z

paddle/fluid/operators/top_k_function_cuda.h

+  int k_left = k;
+
+#pragma unroll
+  for (int digit_pos = sizeof(T) * 8 - RADIX_BITS; digit_pos >= 0;


这里为什么是*8？

Xreki

LGTM

Optimize the perf of top_k when k is too large

2ed6190

ZzSean added 3 commits March 25, 2022 09:25

fix rcom compile

9800da4

fix

ac68c8e

only compile in cuda

409e40f

Zjq9409 reviewed Mar 28, 2022

View reviewed changes

limin2021 previously approved these changes Mar 29, 2022

View reviewed changes

fix log info

4a3b28f

ZzSean dismissed limin2021’s stale review via 4a3b28f March 29, 2022 15:24

Merge branch 'develop' into opt_topk_3d

83998ca

Xreki approved these changes Mar 30, 2022

View reviewed changes

ZzSean merged commit 45078d9 into PaddlePaddle:develop Mar 30, 2022

luotao1 mentioned this pull request Apr 26, 2022

【PFCC-Roadmap】算子性能优化 #42286

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the perf of top_k when k is too large #40941

Optimize the perf of top_k when k is too large #40941

ZzSean commented Mar 25, 2022 •

edited by JamesLim-sy

Loading

paddle-bot-old bot commented Mar 25, 2022

Zjq9409 Mar 28, 2022

ZzSean Mar 28, 2022 •

edited

Loading

Xreki Mar 29, 2022

ZzSean Mar 30, 2022

Zjq9409 Mar 28, 2022

ZzSean Mar 28, 2022

Xreki left a comment

Optimize the perf of top_k when k is too large #40941

Optimize the perf of top_k when k is too large #40941

Conversation

ZzSean commented Mar 25, 2022 • edited by JamesLim-sy Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Mar 25, 2022

Zjq9409 Mar 28, 2022

Choose a reason for hiding this comment

ZzSean Mar 28, 2022 • edited Loading

Choose a reason for hiding this comment

Xreki Mar 29, 2022

Choose a reason for hiding this comment

ZzSean Mar 30, 2022

Choose a reason for hiding this comment

Zjq9409 Mar 28, 2022

Choose a reason for hiding this comment

ZzSean Mar 28, 2022

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

ZzSean commented Mar 25, 2022 •

edited by JamesLim-sy

Loading

ZzSean Mar 28, 2022 •

edited

Loading