Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Segmentation with MobileNetV3 #405

Closed
KevinWang905 opened this issue Oct 19, 2023 · 5 comments
Closed

Slow Segmentation with MobileNetV3 #405

KevinWang905 opened this issue Oct 19, 2023 · 5 comments

Comments

@KevinWang905
Copy link

Hi Tobias,

I'm trying to implement a segmentation model with mobilenetv3 (tensorflow mobilenetv3_large minimalistic) with a LR-ASPP segmentation head that I trained in python. When converting my model, the forward passes take < 1s, but when I load it in C++, the forward pass takes 8s. I am using WSL running Ubuntu 22.04. I'm pretty new to development in C++ so there's likely some compilation mistakes I may have made, but I would love to get your feedback on why this speed discrepancy exists. I've posted the model conversion and loading outputs below. I can send you the model json as well.

> python keras_export/convert_model.py mnv3_LRASPP_min_epoch1 fdeep_mnv3_min_e1.json

loading mnv3_LRASPP_min_epoch1
2023-10-18 17:13:07.254822: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE SSE2 SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
1/1 [==============================] - 0s 390ms/step
Forward pass took 0.430125 s.
1/1 [==============================] - 0s 29ms/step
Forward pass took 0.052999 s.
1/1 [==============================] - 0s 29ms/step
Forward pass took 0.058005 s.
Starting performance measurements.
1/1 [==============================] - 0s 29ms/step
Forward pass took 0.057001 s.
> g++ -O3 -DNDEBUG -march=native -msse -msse2 -msse3 -msse4.1 -msse4.2 -mavx -mavx2 main.cpp
> ./a.out

Loading json ... done. elapsed time: 0.036306 s
Building model ... done. elapsed time: 0.032201 s
Running test 1 of 1 ... done. elapsed time: 8.861812 s
Loading, constructing, testing of fdeep_mnv3_min_e1.json took 8.936081 s overall.
model loaded successfully

Appreciate the work you've put into this library. Thanks!
Kevin

@Dobiasd
Copy link
Owner

Dobiasd commented Oct 20, 2023

Hi Kevin,

thanks for the good report.

Your C++-compiler invocation looks ok for speed. It has the important -O3 -DNDEBUG is there.

Since you have -march=native too, I think you can remove all the other -m... things, i.e., have

g++ -O3 -DNDEBUG -march=native main.cpp in the end.

Can you give this a try?

If it does not help, could you upload your model (the not-yet converted version) for me to experiment with it and find the bottleneck?

@KevinWang905
Copy link
Author

Thanks! I've tried that before and it has the same speed. I've sent an email to you containing the link to my model and some testing code. Let me know if you need anything else.

@Dobiasd
Copy link
Owner

Dobiasd commented Oct 21, 2023

Thank you. With the model you sent me, I just reproduced the performance problem locally. It's actually even worse on my machine, i.e.:

  • Forward pass using TensorFlow (no GPU, just one CPU core allowed): ~ 0.07 s
  • Forward pass using frugally-deep: ~ 21 s 😱

I'll investigate and get back to you here.

Dobiasd added a commit that referenced this issue Oct 21, 2023
@Dobiasd
Copy link
Owner

Dobiasd commented Oct 21, 2023

Profiling (sysprof) showed, all the CPU time is burned exactly here.

In this MR, I introduced this unnecessarily large calculation (very redundant) accidentally. 😬


I just fixed it with this commit and released a new version.

Now, a forward pass with your model in frugally-deep is fast (~ 0.075 s on my machine). 🎉

Thanks a lot for reporting this and providing such a good explanation (plus the example model)! ❤️

@KevinWang905
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants