Slow Segmentation with MobileNetV3 #405

KevinWang905 · 2023-10-19T21:19:12Z

Hi Tobias,

I'm trying to implement a segmentation model with mobilenetv3 (tensorflow mobilenetv3_large minimalistic) with a LR-ASPP segmentation head that I trained in python. When converting my model, the forward passes take < 1s, but when I load it in C++, the forward pass takes 8s. I am using WSL running Ubuntu 22.04. I'm pretty new to development in C++ so there's likely some compilation mistakes I may have made, but I would love to get your feedback on why this speed discrepancy exists. I've posted the model conversion and loading outputs below. I can send you the model json as well.

> python keras_export/convert_model.py mnv3_LRASPP_min_epoch1 fdeep_mnv3_min_e1.json

loading mnv3_LRASPP_min_epoch1
2023-10-18 17:13:07.254822: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE SSE2 SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
1/1 [==============================] - 0s 390ms/step
Forward pass took 0.430125 s.
1/1 [==============================] - 0s 29ms/step
Forward pass took 0.052999 s.
1/1 [==============================] - 0s 29ms/step
Forward pass took 0.058005 s.
Starting performance measurements.
1/1 [==============================] - 0s 29ms/step
Forward pass took 0.057001 s.

> g++ -O3 -DNDEBUG -march=native -msse -msse2 -msse3 -msse4.1 -msse4.2 -mavx -mavx2 main.cpp
> ./a.out

Loading json ... done. elapsed time: 0.036306 s
Building model ... done. elapsed time: 0.032201 s
Running test 1 of 1 ... done. elapsed time: 8.861812 s
Loading, constructing, testing of fdeep_mnv3_min_e1.json took 8.936081 s overall.
model loaded successfully

Appreciate the work you've put into this library. Thanks!
Kevin

The text was updated successfully, but these errors were encountered:

Dobiasd · 2023-10-20T04:25:52Z

Hi Kevin,

thanks for the good report.

Your C++-compiler invocation looks ok for speed. It has the important -O3 -DNDEBUG is there.

Since you have -march=native too, I think you can remove all the other -m... things, i.e., have

g++ -O3 -DNDEBUG -march=native main.cpp in the end.

Can you give this a try?

If it does not help, could you upload your model (the not-yet converted version) for me to experiment with it and find the bottleneck?

KevinWang905 · 2023-10-20T13:16:29Z

Thanks! I've tried that before and it has the same speed. I've sent an email to you containing the link to my model and some testing code. Let me know if you need anything else.

Dobiasd · 2023-10-21T04:17:24Z

Thank you. With the model you sent me, I just reproduced the performance problem locally. It's actually even worse on my machine, i.e.:

Forward pass using TensorFlow (no GPU, just one CPU core allowed): ~ 0.07 s
Forward pass using frugally-deep: ~ 21 s 😱

I'll investigate and get back to you here.

Dobiasd · 2023-10-21T04:50:57Z

Profiling (sysprof) showed, all the CPU time is burned exactly here.

In this MR, I introduced this unnecessarily large calculation (very redundant) accidentally. 😬

I just fixed it with this commit and released a new version.

Now, a forward pass with your model in frugally-deep is fast (~ 0.075 s on my machine). 🎉

Thanks a lot for reporting this and providing such a good explanation (plus the example model)! ❤️

KevinWang905 · 2023-10-21T14:35:24Z

Thank you!

Dobiasd added a commit that referenced this issue Oct 21, 2023

Improve softmax speed, fixes #405

d77600c

Dobiasd closed this as completed in 8495a97 Oct 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow Segmentation with MobileNetV3 #405

Slow Segmentation with MobileNetV3 #405

KevinWang905 commented Oct 19, 2023

Dobiasd commented Oct 20, 2023

KevinWang905 commented Oct 20, 2023

Dobiasd commented Oct 21, 2023

Dobiasd commented Oct 21, 2023

KevinWang905 commented Oct 21, 2023

Slow Segmentation with MobileNetV3 #405

Slow Segmentation with MobileNetV3 #405

Comments

KevinWang905 commented Oct 19, 2023

Dobiasd commented Oct 20, 2023

KevinWang905 commented Oct 20, 2023

Dobiasd commented Oct 21, 2023

Dobiasd commented Oct 21, 2023

KevinWang905 commented Oct 21, 2023