quick and dirty inference time benchmark for TFLite gles delegate
The TensorFlow team announced TFLite GPU delegate and published related docs [2][3] in Jan 2019. But except Mobilenet V1 classifier, there is no publicly available app to evaluate it, so I wrote a quick and dirty app to evaluate other models.
For the 4 public models mentioned in [1], I got the following numbers on Pixel 2.
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) |
---|---|---|---|
Mobilenet | 150 | 75 | 21 |
PoseNet | 183 | 96 | 40 |
DeepLab V3 | 219 | 131 | 91 |
Mobilenet SSD V2 COCO | 264 | 158 | 49 |
On Xiaomi Mi 9, I got
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) |
---|---|---|---|
Mobilenet | 39 | 35 | 15 |
PoseNet | 48 | 47 | 19 |
DeepLab V3 | 61 | 64 | 65 |
Mobilenet SSD V2 COCO | 69 | 75 | 36 |
On Pixel 3a, I got
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) |
---|---|---|---|
Mobilenet | 113 | 80 | 52 |
PoseNet | 138 | 96 | 78 |
DeepLab V3 | 173 | 132 | 144 |
Mobilenet SSD V2 COCO | 200 | 167 | 113 |
Check https://github.com/freedomtan/glDelegateBenchmark/ for iOS code
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) |
---|---|---|---|
Mobilenet | 117 | 37 | 20 |
PoseNet | 140 | 47 | 39 |
DeepLab V3 | 177 | 72 | 122 |
Mobilenet SSD V2 COCO | 202 | 75 | 60 |
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) |
---|---|---|---|
Mobilenet | 107 | 44 | 51 |
PoseNet | 131 | 57 | 77 |
DeepLab V3 | 164 | 82 | 145 |
Mobilenet SSD V2 COCO | 184 | 86 | 113 |
Update Dec 8, 2019, Dec for Pixel 3a came with DSP and GPU NNAPI 1.2 driver, so we can have NNAPI numbers on Pixel 3a
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU OpenCL (ms) | GPU GL Compute Shader (ms) |
---|---|---|---|---|
Mobilenet | 118 | 34 | 10 | 21 |
PoseNet | 142 | 43 | 14 | 41 |
DeepLab V3 | 174 | 75 | 21 | 69 |
Mobilenet SSD V2 COCO | 202 | 73 | 18 | 48 |
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU (ms) | NNPAI (ms) |
---|---|---|---|---|
Mobilenet | 107 | 44 | 28 | 25 |
PoseNet | 131 | 57 | 38 | 32 |
DeepLab V3 | 164 | 82 | 60 | 186 |
Mobilenet SSD V2 COCO | 184 | 86 | 54 | 249 |
model name | CPU 1 thread (ms) | CPU 4 threads (ms) | GPU Delegate (ms) | NNAPI (ms) |
---|---|---|---|---|
Mobilenet | 42 | 13 | 8 | 7 |
PoseNet | 52 | 15 | 11 | 11 |
DeepLab V3 | 66 | 25 | 20 | 98 |
Mobilenet SSD V2 COCO | 70 | 24 | 16 | 86 |
[2] https://www.tensorflow.org/lite/performance/gpu
[3] https://www.tensorflow.org/lite/performance/gpu_advanced