Skip to content

Latest commit

 

History

History
55 lines (30 loc) · 6.03 KB

README.md

File metadata and controls

55 lines (30 loc) · 6.03 KB

Bench-marking Edge Computing

Comparing Google, Intel, and NVIDIA accelerator hardware

Compare inferencing on the following platforms

Initial benchmark run was with the MobileNet v2 SSD and MobileNet v1 SSD models, both models trained on the Common Objects in Context (COCO) dataset.

A single 3888×2916 pixel test image was used which contained two recognisable objects in the frame, a banana🍌 and an apple🍎.

The image was resized down to 300×300 pixels before presenting it to the model, and each model was run 10,000 times before an average inferencing time was taken.

img

Part I — Benchmarking

A more in-depth analysis of the results

enchmarking was done using TensorFlow, or for the hardware accelerated platforms that do not support TensorFlow their native framework, using the same models used on the other platforms converted to the appropriate native framework. For the Coral EdgeTPU-based hardware we used TensorFlow Lite, and for Intel’s Movidius-based hardware we used their OpenVINO toolkit. We’ll also benchmarked the NIVIDIA’s Jetson Nano both with ‘vanilla’ TensorFlow (with GPU support), and then again with the same TensorFlow model but optimised using NVIDIA’s TensorRT framework.

img img

Inferencing was carried out with the MobileNet v2 SSD and MobileNet v1 0.75 depth SSD models, both models trained on the Common Objects in Context (COCO) dataset. The 3888×2916 pixel test image was resized down to 300×300 pixels before presenting it to the model, and each model was run 10,000 times before an average inferencing time was taken. The first inferencing run, which takes longer due to loading overheads, was discarded.

Benchmarks were carried out twice on the NVIDIA Jetson Nano, first using vanilla TensorFlow models, and a second time using those models after optimisation using NVIDIA’s TensorFlow with TensorRT library.

img

Unsurprisingly the unaccelerated Raspberry Pi fairs the worst of any of the platforms we benchmarked, managing to sustain inferencing at just 1 to 2 fps.

img

If you’re interested in pushing the performance of the Raspberry Pi you could try building TensorFlow Lite for the Raspberry Pi. Unfortunately there are currently no binary distributions available, it can’t be deployed using pip. So if you want to try out TensorFlow Lite, you’d going to have to build it from source either by cross-compiling, or natively on the Raspberry Pi itself. I’m not going to go down that route right now, so instead we’ll drop the Raspberry Pi as an outlier, and take a closer look at the other platforms.

img

Our results from the Jetson Nano are particularly interesting when compared against the benchmarking results released by NVIDIA for the board.

img

We’re seeing a significant slower inferencing in our own benchmarking using TensorFlow than in the NVIDIA tests, around ×3 slower with MobileNet v2 SSD. However going back to their original code, which was written in C++ and uses native TensorRT for inferencing, and following their benchmarking instructions I was able to successfully reproduce their published MobileNet V2 benchmark performance times.

While our models optimised using TensorRT run considerably faster on the Jetson Nano than vanilla TensorFlow models, they still don’t run as fast as the those in the original NVIDIA C++ benchmark tests. Talking with NVIDIA they tell me that this isn’t just the difference between a compiled and an interpreted language, between C++ and Python.