Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add fused vision transformer #3034

Merged
merged 11 commits into from
Jan 15, 2024
Merged

Conversation

DanGuge
Copy link

@DanGuge DanGuge commented Nov 4, 2023

PR types

  • vision transformer inference 性能优化

PR Changes

  • 添加 FusedVisionTransformer & fused operator

Description

  • 实验配置
    • GPU Tesla V100-SXM2 16GB 单卡
    • warmup_times=10, test_times=100
  • 测试数据中分别展示了ViT_large_patch16_224ViT_large_patch32_384batch为6,数据类型为fp32和fp16的加速效果
    • 附录文件中,记录了batch从1开始,到OOM的测试数据

ViT_large_patch16_224

  • 配置 fp32 (N, C, H, W) = (6, 3, 224, 224)
次数 naive vit(ms) fused vit(ms) 加速比 加速倍数 精度 rtol=5e-03, atol=1e-03
1 69.94958639 67.44493008 3.58% 103.71%
2 69.89186049 67.66503334 3.19% 103.29%
3 69.85519648 67.43098497 3.47% 103.60%
4 69.86840248 67.25456476 3.74% 103.89%
5 69.75688934 67.60543585 3.08% 103.18%
平均 69.86438704 67.4801898 3.41% 103.53%
  • 配置 fp16 (N, C, H, W) = (6, 3, 224, 224)
次数 naive vit(ms) fused vit(ms) 加速比 加速倍数 精度 rtol=5e-03, atol=1e-03 Max absolute difference Max relative difference
1 21.95616722 18.78335238 14.45% 116.89% 215 / 6000 (3.58%) 0.007812 3.852
2 20.90988636 18.79692793 10.11% 111.24% 215 / 6000 (3.58%) 0.007812 2.93
3 21.87682867 18.42218399 15.79% 118.75% 254 / 6000 (4.23%) 0.009766 3.555
4 21.66805029 18.23710203 15.83% 118.81% 230 / 6000 (3.83%) 0.007812 2.46
5 21.72460556 18.21647167 16.15% 119.26% 222 / 6000 (3.7%) 0.007812 182.1
平均 21.62710762 18.4912076 14.50% 116.96%

ViT_large_patch32_384

  • 配置 fp32 (N, C, H, W) = (6, 3, 384, 384)
次数 naive vit(ms) fused vit(ms) 加速比 加速倍数 精度 rtol=5e-03, atol=1e-03
1 54.41297054 52.34796524 3.80% 103.94%
2 54.29521799 52.56785154 3.18% 103.29%
3 54.25016403 52.28267431 3.63% 103.76%
4 54.19023037 52.3793745 3.34% 103.46%
5 54.54382658 52.29720354 4.12% 104.30%
平均 54.3384819 52.37501383 3.61% 103.75%
  • 配置 fp16 (N, C, H, W) = (6, 3, 384, 384)
次数 naive vit(ms) fused vit(ms) 加速比 加速倍数 精度 rtol=5e-03, atol=1e-03 Max absolute difference Max relative difference
1 19.5188117 16.8251586 13.80% 116.01% 132 / 6000 (2.2%) 0.007812 12.73
2 20.19104242 16.54681206 18.05% 122.02% 161 / 6000 (2.68%) 0.007812 1.468
3 20.749681 17.88716555 13.80% 116.00% 127 / 6000 (2.12%) 0.007812 2.258
4 20.00149012 17.15390921 14.24% 116.60% 127 / 6000 (2.12%) 0.007812 1.253
5 21.1058569 16.32851124 22.64% 129.26% 140 / 6000 (2.33%) 0.007812 3.428
平均 20.31337643 16.94831133 16.57% 119.85%

总结

具体数据参照测试记录

  • 在fp32配置的情况下
    • FusedVisionTransformer取得了和VisionTransformer一致的精度,在耗时上相同
    • FusedVisionTransformer最大可以进行batch=512的推理,而VisionTransformer在batch=32就发生了OOM
  • 在fp16配置的情况下
    • FusedVisionTransformer相较于VisionTransformer加速1.1-1.3倍(batch不同时,加速比不同)
    • FusedVisionTransformerVisionTransformer在rtol=5e-03,atol=1e-03配置下
      • ViT_large_patch16_224的误差元素在3%-5%
      • ViT_large_patch32_384的误差元素在1%-3%
    • FusedVisionTransformer最大可以进行batch=128的推理,而VisionTransformer在batch=64就发生了OOM

测试记录

Copy link

paddle-bot bot commented Nov 4, 2023

Thanks for your contribution!

@DanGuge
Copy link
Author

DanGuge commented Nov 8, 2023

@cuicheng01 hello,能帮忙review一下吗?然后我这份代码无法合并到主分支,因为paddle的版本不一致,能提交到一个新的feature分支吗?比如一个新的fused_vit分支

@DanGuge
Copy link
Author

DanGuge commented Nov 23, 2023

@xiaoxiaohehe001 峰哥记得review一下

@DanGuge DanGuge changed the base branch from release/2.5 to fused_vit November 24, 2023 10:54
@DanGuge DanGuge changed the base branch from fused_vit to release/2.5 November 24, 2023 10:56
@DanGuge DanGuge changed the base branch from release/2.5 to fused_vit November 25, 2023 03:53
Copy link

@xiaoxiaohehe001 xiaoxiaohehe001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@xiaoxiaohehe001 xiaoxiaohehe001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,216 @@
# Fused Vision Transformer 高性能推理使用
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

测试没通过,麻烦补充一下Paddle的版本号

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我的测试是基于paddle develop commit:5a3c593f38ed79662a91e0650e1b453f8b5a17d6

@cuicheng01 cuicheng01 merged commit 1c7c773 into PaddlePaddle:fused_vit Jan 15, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants