Skip to content

Latest commit

 

History

History
67 lines (52 loc) · 5.59 KB

ViT_and_DeiT.md

File metadata and controls

67 lines (52 loc) · 5.59 KB

ViT 与 DeiT 系列


目录

1. 概述

ViT(Vision Transformer)系列模型是 Google 在 2020 年提出的,该模型仅使用标准的 Transformer 结构,完全抛弃了卷积结构,将图像拆分为多个 patch 后再输入到 Transformer 中,展示了 Transformer 在 CV 领域的潜力。论文地址

DeiT(Data-efficient Image Transformers)系列模型是由 FaceBook 在 2020 年底提出的,针对 ViT 模型需要大规模数据集训练的问题进行了改进,最终在 ImageNet 上取得了 83.1%的 Top1 精度。并且使用卷积模型作为教师模型,针对该模型进行知识蒸馏,在 ImageNet 数据集上可以达到 85.2% 的 Top1 精度。论文地址

2. 精度、FLOPS 和参数量

Models Top1 Top5 Reference
top1
Reference
top5
FLOPS
(G)
Params
(M)
ViT_small_patch16_224 0.7553 0.9211 0.7785 0.9342 9.41 48.60
ViT_base_patch16_224 0.8187 0.9618 0.8178 0.9613 16.85 86.42
ViT_base_patch16_384 0.8414 0.9717 0.8420 0.9722 49.35 86.42
ViT_base_patch32_384 0.8176 0.9613 0.8166 0.9613 12.66 88.19
ViT_large_patch16_224 0.8303 0.9655 0.8306 0.9644 59.65 304.12
ViT_large_patch16_384 0.8513 0.9736 0.8517 0.9736 174.70 304.12
ViT_large_patch32_384 0.8153 0.9608 0.815 - 44.24 306.48
Models Top1 Top5 Reference
top1
Reference
top5
FLOPS
(G)
Params
(M)
DeiT_tiny_patch16_224 0.7208 0.9112 0.722 0.911 1.07 5.68
DeiT_small_patch16_224 0.7982 0.9495 0.799 0.950 4.24 21.97
DeiT_base_patch16_224 0.8180 0.9558 0.818 0.956 16.85 86.42
DeiT_base_patch16_384 0.8289 0.9624 0.829 0.972 49.35 86.42
DeiT_tiny_distilled_patch16_224 0.7449 0.9192 0.745 0.919 1.08 5.87
DeiT_small_distilled_patch16_224 0.8117 0.9538 0.812 0.954 4.26 22.36
DeiT_base_distilled_patch16_224 0.8330 0.9647 0.834 0.965 16.93 87.18
DeiT_base_distilled_patch16_384 0.8520 0.9720 0.852 0.972 49.43 87.18

3. 基于 V100 GPU 的预测速度

Models Crop Size Resize Short Size FP32
Batch Size=1
(ms)
FP32
Batch Size=4
(ms)
FP32
Batch Size=8
(ms)
ViT_small_
patch16_224
256 224 3.71 9.05 16.72
ViT_base_
patch16_224
256 224 6.12 14.84 28.51
ViT_base_
patch16_384
384 384 14.15 48.38 95.06
ViT_base_
patch32_384
384 384 4.94 13.43 24.08
ViT_large_
patch16_224
256 224 15.53 49.50 94.09
ViT_large_
patch16_384
384 384 39.51 152.46 304.06
ViT_large_
patch32_384
384 384 11.44 36.09 70.63
Models Crop Size Resize Short Size FP32
Batch Size=1
(ms)
FP32
Batch Size=4
(ms)
FP32
Batch Size=8
(ms)
DeiT_tiny_
patch16_224
256 224 3.61 3.94 6.10
DeiT_small_
patch16_224
256 224 3.61 6.24 10.49
DeiT_base_
patch16_224
256 224 6.13 14.87 28.50
DeiT_base_
patch16_384
384 384 14.12 48.80 97.60
DeiT_tiny_
distilled_patch16_224
256 224 3.51 4.05 6.03
DeiT_small_
distilled_patch16_224
256 224 3.70 6.20 10.53
DeiT_base_
distilled_patch16_224
256 224 6.17 14.94 28.58
DeiT_base_
distilled_patch16_384
384 384 14.12 48.76 97.09