-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add fused vision transformer #3034
Conversation
Thanks for your contribution! |
@cuicheng01 hello,能帮忙review一下吗?然后我这份代码无法合并到主分支,因为paddle的版本不一致,能提交到一个新的feature分支吗?比如一个新的fused_vit分支 |
@xiaoxiaohehe001 峰哥记得review一下 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -0,0 +1,216 @@ | |||
# Fused Vision Transformer 高性能推理使用 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
测试没通过,麻烦补充一下Paddle的版本号
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我的测试是基于paddle develop commit:5a3c593f38ed79662a91e0650e1b453f8b5a17d6
PR types
PR Changes
Description
ViT_large_patch16_224
和ViT_large_patch32_384
在batch
为6,数据类型为fp32和fp16的加速效果batch
从1开始,到OOM的测试数据ViT_large_patch16_224
ViT_large_patch32_384
总结
FusedVisionTransformer
取得了和VisionTransformer
一致的精度,在耗时上相同FusedVisionTransformer
最大可以进行batch=512的推理,而VisionTransformer
在batch=32就发生了OOMFusedVisionTransformer
相较于VisionTransformer
加速1.1-1.3倍(batch不同时,加速比不同)FusedVisionTransformer
和VisionTransformer
在rtol=5e-03,atol=1e-03配置下ViT_large_patch16_224
的误差元素在3%-5%ViT_large_patch32_384
的误差元素在1%-3%FusedVisionTransformer
最大可以进行batch=128的推理,而VisionTransformer
在batch=64就发生了OOM测试记录