Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluid performance tuning plan #6024

Closed
reyoung opened this issue Nov 29, 2017 · 5 comments
Closed

Fluid performance tuning plan #6024

reyoung opened this issue Nov 29, 2017 · 5 comments

Comments

@reyoung
Copy link
Collaborator

reyoung commented Nov 29, 2017

We plan to tune fluid's performance with a loop with three steps:

  1. Profile: To figure out which part of the fluid is slow.
  2. Find problems & Give a fix: We will discuss and find the problems based on profile results.
  3. Profile: To confirm the problems has been solved and the performance is improved.

There are several jobs for these three steps:

  1. Find a machine with docker and GPU for profiling. @jacquesqiao
  2. Neural network configurations for CNN, LSTM, etc. @qingqing01 @dzhwinter
  3. Setup an environment for profiling. @chengduoZH
    • Use cProfile for Python, yap for Python/C++, nvprof for CUDA
  4. Find problems: All members together.
  5. Fix GPU problems: @jacquesqiao @qingqing01
  6. Fix CPU problems: @dzhwinter
  7. Fix Python problems: TODO
@dzhwinter
Copy link
Contributor

dzhwinter commented Nov 29, 2017

  1. Python code optimization
    We use CProfile profiling the python code. The tool usage can be checked out at cpu_profiling. If we need to dig into the code. You can add this snippet to the script, then you will get an accurate function call count. This is useful to measure whether there is redundant Variable, Operator is created.
import cProfile
pr = cProfile.Profile()
pr.enable()
// here the test codes. 
pr.disable()
  • fine-tune the fit_a_line tests, check the components time cost. e.g.Program, Variable, Operator, Block
  • fine-tune one forward pass, check total time cost in Python side.
  1. C++ code optimization
    The yep + gperftools is excellent, but there is many Py_Eval, Py_Object function call disturb the focus point. Here is the method only retain necessary c++ calls.
#ifdef ENABLE_PROFILE
#include <gperftools/profiler.h>
#endif

ProfilerStart("cs.prof");
// here the test codes.
ProfilerStop()
ProfilerFlush()
  • fine-tune the training forward/backward process, find the bottle-neck.
  1. Cuda code
    TBD

@dzhwinter
Copy link
Contributor

dzhwinter commented Nov 29, 2017

Also, we had created benchmark docker image and benchmark repo for every one reproduce the result.
The benchmark repo contains the tensorflow and paddle scripts.
https://github.com/dzhwinter/benchmark

The docker image contains all the profile tools.
https://hub.docker.com/r/dzhwinter/benchmark

Need to note that the docker image should be run in host mode, then you can check the profiler result in your browser. @tonyyang-svail

nvidia-docker run -it --rm --network=host -v $PWD:/paddle -v /root/.cache:/root/.cache  dzhwinter/benchmark /bin/bash  

@jacquesqiao
Copy link
Member

jacquesqiao commented Nov 30, 2017

profile steps:

  1. make a work dir for your self

  2. checkout paddle source code and benchmark source code
    https://github.com/dzhwinter/benchmark

  3. cd to your source code

cd benchmark
  1. start and enter the benchmark container
nvidia-docker run -it --rm --network=host -v $PWD:/paddle -v /root/.cache:/root/.cache  dzhwinter/benchmark /bin/bash  
  1. build your source code inside the container and installed it.
# if you want to use pprof to debug, use `RelWithDebInfo`, if just want to calculate time, can use `Release`
# use RelWithDebInfo to add debug info
cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo|Release
make -j4 install
  1. run benchmark code
export OMP_NUM_THREADS=1
python -m yep -v test_recognize_digits_conv.py
  1. run pprof. (NOTE: please random choose a port between 8000:9000)
# stop openmp optimize
pprof -http=0.0.0.0:8123 `which python` test_recognize_digits_conv.py.prof
  1. visit webpage
    open your browser and type ip:port. for example: server_ip : 8123.

@reyoung
Copy link
Collaborator Author

reyoung commented Dec 6, 2017

Basic Performance tunning has been done. Please reference Project https://github.com/PaddlePaddle/Paddle/projects/29#card-5921207 for the following works

@reyoung reyoung closed this as completed Dec 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment