[Speed]speed up python executor in fluid #8729

jacquesqiao · 2018-03-05T02:24:13Z

Background

problem

In our Python executor, every executor.run will clone the program and then add feed and fetch op to the cloned program. The following profile demonstrates that the Program.clone is very time-consuming. I add a simple cache to void the program clone, and the result is very impressive.

solution

avoid the clone.

Experiment

profile script

#8674

condition

one card without parallel_do

batch_num	before(s)	after(s)	after/before	before/after
10	9.91367912292	7.47897624969	0.7544	1.3255395915090542
20	19.6153604984	14.6058752537	0.744	1.3429774085898059
30	28.9721696377	21.7348105907	0.7501	1.332984684490269

timeline

before optimize

after optimize

jacquesqiao · 2018-03-05T04:22:37Z

Related issue

Should we split Executor::Run into Executor::Prepare and Executor::exe #6285 split Executor::Run into Executor::Prepare and Executor::exe
Creating and destroying operators in Executor::Run is slow in WhileOp #6885 Creating and destroying operators in Executor::Run is slow in WhileOp
se_resnet_152 multi-gpu profile #8661 se_resnet_152 multi-gpu profile
fluid vs pytorch profile on se_resnext152 #8677 fluid vs pytorch profile

jacquesqiao · 2018-03-06T07:18:11Z

profile with parallel_do

one card with parallel_do

batch_num	before(s)	after(s)	parallel_do & no cache	parallel_do & cache
5	4.7715549469	3.86894655228	9.00337505341	7.58821201324
10	9.91367912292	7.47897624969	16.4335706234	13.0987007618
20	22.2372214794	16.6557092667	30.1410288811	23.9675991535
30	28.9721696377	21.7348105907

conclusion

parallel_do add about 40% of performance loss with the program cache

without parallel_do

with parallel_do

wangkuiyi · 2018-03-08T04:20:24Z

The existence of Feed and Fetch operators is a wrong design:

The Feed operator copies data from Python variables into Fluid's variable. This enables users to write Python programs to load and augment data before sending the data to the Fluid program. However, as Fluid was designed as a new programming language, there should not be a bridge from Python variables to Fluid variables.

We are implementing data loading and augmentation operators. Once we completes, we can remove the Fetch operator, thus no more problem as cloning the ProgramDesc to add Fetch operators every time we call Executor::Run.
The Fetch operator copies variables from Fluid to Python. Similar to the above argument, we should not have Fetch operator at all. Let us polish the Print operator to make it prints the data in the format required by VisualDL, so we can do performance/accuracy analysis using VisualDL, but not in Python.

jacquesqiao self-assigned this Mar 5, 2018

jacquesqiao mentioned this issue Mar 5, 2018

Add program cache for executor.py #8744

Merged

dzhwinter changed the title ~~speed up python executor in fluid~~ [framework]speed up python executor in fluid Mar 6, 2018

dzhwinter changed the title ~~[framework]speed up python executor in fluid~~ [Framework]speed up python executor in fluid Mar 6, 2018

dzhwinter changed the title ~~[Framework]speed up python executor in fluid~~ [Speed]speed up python executor in fluid Mar 6, 2018

jacquesqiao mentioned this issue Mar 12, 2018

SE-ResNeXt Optimization #8990

Closed

jacquesqiao closed this as completed Mar 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speed]speed up python executor in fluid #8729

[Speed]speed up python executor in fluid #8729

jacquesqiao commented Mar 5, 2018 •

edited

Loading

jacquesqiao commented Mar 5, 2018 •

edited

Loading

jacquesqiao commented Mar 6, 2018 •

edited

Loading

wangkuiyi commented Mar 8, 2018

[Speed]speed up python executor in fluid #8729

[Speed]speed up python executor in fluid #8729

Comments

jacquesqiao commented Mar 5, 2018 • edited Loading

Background

problem

solution

Experiment

profile script

condition

timeline

before optimize

after optimize

jacquesqiao commented Mar 5, 2018 • edited Loading

jacquesqiao commented Mar 6, 2018 • edited Loading

profile with parallel_do

conclusion

without parallel_do

with parallel_do

wangkuiyi commented Mar 8, 2018

jacquesqiao commented Mar 5, 2018 •

edited

Loading

jacquesqiao commented Mar 5, 2018 •

edited

Loading

jacquesqiao commented Mar 6, 2018 •

edited

Loading