Fluid performance tuning plan #6024

reyoung · 2017-11-29T06:08:47Z

We plan to tune fluid's performance with a loop with three steps:

Profile: To figure out which part of the fluid is slow.
Find problems & Give a fix: We will discuss and find the problems based on profile results.
Profile: To confirm the problems has been solved and the performance is improved.

There are several jobs for these three steps:

Find a machine with docker and GPU for profiling. @jacquesqiao
Neural network configurations for CNN, LSTM, etc. @qingqing01 @dzhwinter
- Add benchmark configuration and scripts for convnets. #6056
Setup an environment for profiling. @chengduoZH
- Use cProfile for Python, yap for Python/C++, nvprof for CUDA
Find problems: All members together.
Fix GPU problems: @jacquesqiao @qingqing01
Fix CPU problems: @dzhwinter
Fix Python problems: TODO

dzhwinter · 2017-11-29T16:54:43Z

Python code optimization
We use CProfile profiling the python code. The tool usage can be checked out at cpu_profiling. If we need to dig into the code. You can add this snippet to the script, then you will get an accurate function call count. This is useful to measure whether there is redundant Variable, Operator is created.

import cProfile
pr = cProfile.Profile()
pr.enable()
// here the test codes. 
pr.disable()

fine-tune the fit_a_line tests, check the components time cost. e.g.Program, Variable, Operator, Block
fine-tune one forward pass, check total time cost in Python side.

C++ code optimization
The yep + gperftools is excellent, but there is many Py_Eval, Py_Object function call disturb the focus point. Here is the method only retain necessary c++ calls.

#ifdef ENABLE_PROFILE
#include <gperftools/profiler.h>
#endif

ProfilerStart("cs.prof");
// here the test codes.
ProfilerStop()
ProfilerFlush()

fine-tune the training forward/backward process, find the bottle-neck.

Cuda code
TBD

dzhwinter · 2017-11-29T17:02:23Z

Also, we had created benchmark docker image and benchmark repo for every one reproduce the result.
The benchmark repo contains the tensorflow and paddle scripts.
https://github.com/dzhwinter/benchmark

The docker image contains all the profile tools.
https://hub.docker.com/r/dzhwinter/benchmark

Need to note that the docker image should be run in host mode, then you can check the profiler result in your browser. @tonyyang-svail

nvidia-docker run -it --rm --network=host -v $PWD:/paddle -v /root/.cache:/root/.cache  dzhwinter/benchmark /bin/bash

jacquesqiao · 2017-11-30T02:10:42Z

profiling document

CPU

https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/optimization/cpu_profiling.md

GPU

https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/optimization/gpu_profiling_cn.rst

jacquesqiao · 2017-11-30T06:39:08Z

profile steps:

make a work dir for your self
checkout paddle source code and benchmark source code
https://github.com/dzhwinter/benchmark
cd to your source code

cd benchmark

start and enter the benchmark container

nvidia-docker run -it --rm --network=host -v $PWD:/paddle -v /root/.cache:/root/.cache  dzhwinter/benchmark /bin/bash

build your source code inside the container and installed it.

# if you want to use pprof to debug, use `RelWithDebInfo`, if just want to calculate time, can use `Release`
# use RelWithDebInfo to add debug info
cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo|Release
make -j4 install

run benchmark code

export OMP_NUM_THREADS=1
python -m yep -v test_recognize_digits_conv.py

run pprof. (NOTE: please random choose a port between 8000:9000)

# stop openmp optimize
pprof -http=0.0.0.0:8123 `which python` test_recognize_digits_conv.py.prof

visit webpage
open your browser and type ip:port. for example: server_ip : 8123.

reyoung · 2017-12-06T04:48:27Z

Basic Performance tunning has been done. Please reference Project https://github.com/PaddlePaddle/Paddle/projects/29#card-5921207 for the following works

reyoung added the milestone label Nov 29, 2017

reyoung added this to the Release 0.11.0 milestone Nov 29, 2017

reyoung assigned jacquesqiao, dzhwinter, qingqing01 and chengduoZH Nov 29, 2017

jacquesqiao mentioned this issue Nov 29, 2017

prepare a machine with docker and GPU for profiling #6025

Closed

dzhwinter closed this as completed Nov 29, 2017

dzhwinter reopened this Nov 29, 2017

dzhwinter mentioned this issue Nov 29, 2017

Single GPU performance Improvement #5957

Closed

dzhwinter assigned tonyyang-svail, jacquesqiao, dzhwinter, qingqing01 and chengduoZH and unassigned jacquesqiao, dzhwinter, qingqing01 and chengduoZH Nov 29, 2017

abhinavarora self-assigned this Nov 30, 2017

This was referenced Dec 4, 2017

image classification benchmark and finetune #5892

Closed

Benchmark convnet model tf dzhwinter/benchmark#6

Merged

tonyyang-svail added the performance tuning label Dec 6, 2017

reyoung closed this as completed Dec 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluid performance tuning plan #6024

Fluid performance tuning plan #6024

reyoung commented Nov 29, 2017 •

edited by qingqing01

Loading

dzhwinter commented Nov 29, 2017 •

edited by NHZlX

Loading

dzhwinter commented Nov 29, 2017 •

edited

Loading

jacquesqiao commented Nov 30, 2017 •

edited

Loading

jacquesqiao commented Nov 30, 2017 •

edited by dzhwinter

Loading

reyoung commented Dec 6, 2017 •

edited

Loading

Fluid performance tuning plan #6024

Fluid performance tuning plan #6024

Comments

reyoung commented Nov 29, 2017 • edited by qingqing01 Loading

dzhwinter commented Nov 29, 2017 • edited by NHZlX Loading

dzhwinter commented Nov 29, 2017 • edited Loading

jacquesqiao commented Nov 30, 2017 • edited Loading

profiling document

CPU

GPU

jacquesqiao commented Nov 30, 2017 • edited by dzhwinter Loading

profile steps:

reyoung commented Dec 6, 2017 • edited Loading

reyoung commented Nov 29, 2017 •

edited by qingqing01

Loading

dzhwinter commented Nov 29, 2017 •

edited by NHZlX

Loading

dzhwinter commented Nov 29, 2017 •

edited

Loading

jacquesqiao commented Nov 30, 2017 •

edited

Loading

jacquesqiao commented Nov 30, 2017 •

edited by dzhwinter

Loading

reyoung commented Dec 6, 2017 •

edited

Loading