Skip to content

Commit

Permalink
对齐develop和v1.3 (PaddlePaddle#662)
Browse files Browse the repository at this point in the history
* synchronize with develop (PaddlePaddle#642)

* update_commitid1.3 (PaddlePaddle#641)

* update inference c++ API doc (PaddlePaddle#634)

* update inference c++ API doc

* fix link

* thorough clean for doc (PaddlePaddle#644)

* thorough clean

* delete_DS_Store

* Cherrypick1.3 (PaddlePaddle#652)

* thorough clean

* delete_DS_Store

* [Don't merge now]update_install_doc (PaddlePaddle#643)

* update_install_doc

* follow_comments

* add maxdepth (PaddlePaddle#646)

* upload_md (PaddlePaddle#649)

* update_version (PaddlePaddle#650)

* Translation of 16 new apis (PaddlePaddle#651)

* fix_windows

* Final update 1.3 (PaddlePaddle#653)

* thorough clean

* delete_DS_Store

* update_1.3

* Deadlink fix (PaddlePaddle#654)

* fix_deadlinks

* update_docker

* Update release_note.rst

* Update index_cn.rst

* update_Paddle (PaddlePaddle#658)

* fix pic (PaddlePaddle#659)

* [to 1.3] cn api debug (PaddlePaddle#655) (PaddlePaddle#661)

* debug

* fix 2 -conv2d

* "锚" ==> anchor(s)
  • Loading branch information
shanyi15 authored Feb 28, 2019
1 parent 2f6c97f commit 8e68fa0
Show file tree
Hide file tree
Showing 24 changed files with 124 additions and 131 deletions.
6 changes: 3 additions & 3 deletions doc/fluid/advanced_usage/deploy/mobile/mobile_readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

## Features

- 高性能支持ARM CPU
- 高性能支持ARM CPU
- 支持Mali GPU
- 支持Andreno GPU
- 支持苹果设备的GPU Metal实现
Expand Down Expand Up @@ -55,7 +55,7 @@

### 2. Caffe转为Paddle Fluid模型

请参考这里[这里](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/caffe2fluid)
请参考这里[这里](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/caffe2fluid)

### 3. ONNX

Expand All @@ -78,5 +78,5 @@ Paddle-Mobile 提供相对宽松的Apache-2.0开源协议 [Apache-2.0 license](L


## 旧版 Mobile-Deep-Learning
原MDL(Mobile-Deep-Learning)工程被迁移到了这里 [Mobile-Deep-Learning](https://github.com/allonli/mobile-deep-learning)
原MDL(Mobile-Deep-Learning)工程被迁移到了这里 [Mobile-Deep-Learning](https://github.com/allonli/mobile-deep-learning)

6 changes: 3 additions & 3 deletions doc/fluid/advanced_usage/deploy/mobile/mobile_readme_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Welcome to Paddle-Mobile GitHub project. Paddle-Mobile is a project of PaddlePad

## Features

- high performance in support of ARM CPU
- high performance in support of ARM CPU
- support Mali GPU
- support Andreno GPU
- support the realization of GPU Metal on Apple devices
Expand Down Expand Up @@ -50,7 +50,7 @@ At present Paddle-Mobile only supports models trained by Paddle fluid. Models ca
### 1. Use Paddle Fluid directly to train
It is the most reliable method to be recommended
### 2. Transform Caffe to Paddle Fluid model
[https://github.com/PaddlePaddle/models/tree/develop/fluid/image_classification/caffe2fluid](https://github.com/PaddlePaddle/models/tree/develop/fluid/image_classification/caffe2fluid)
[https://github.com/PaddlePaddle/models/tree/develop/fluid/image_classification/caffe2fluid](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/caffe2fluid)
### 3. ONNX
ONNX is the acronym of Open Neural Network Exchange. The project is aimed to make a full communication and usage among different neural network development frameworks.

Expand All @@ -76,4 +76,4 @@ Paddle-Mobile provides relatively unstrict Apache-2.0 Open source agreement [Apa


## Old version Mobile-Deep-Learning
Original MDL(Mobile-Deep-Learning) project has been transferred to [Mobile-Deep-Learning](https://github.com/allonli/mobile-deep-learning)
Original MDL(Mobile-Deep-Learning) project has been transferred to [Mobile-Deep-Learning](https://github.com/allonli/mobile-deep-learning)
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

## 创建本地分支

Paddle 目前使用[Git流分支模型](http://nvie.com/posts/a-successful-git-branching-model/)进行开发,测试,发行和维护,具体请参考 [Paddle 分支规范](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/releasing_process.md#paddle-分支规范)
Paddle 目前使用[Git流分支模型](http://nvie.com/posts/a-successful-git-branching-model/)进行开发,测试,发行和维护,具体请参考 [Paddle 分支规范](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/others/releasing_process.md)

所有的 feature 和 bug fix 的开发工作都应该在一个新的分支上完成,一般从 `develop` 分支上创建新分支。

Expand Down Expand Up @@ -110,7 +110,7 @@ no changes added to commit (use "git add" and/or "git commit -a")
➜ docker run -it -v $(pwd):/paddle paddle:latest-dev bash -c "cd /paddle/build && ctest"
```

关于构建和测试的更多信息,请参见[使用Docker安装运行](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/v2/build_and_install/docker_install_cn.rst)
关于构建和测试的更多信息,请参见[使用Docker安装运行](../../../beginners_guide/install/install_Docker.html)

## 提交(commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
You will learn how to develop programs in local environment under the guidelines of this document.

## Requirements of coding
- Please refer to the coding comment format of [Doxygen](http://www.stack.nl/~dimitri/doxygen/)
- Please refer to the coding comment format of [Doxygen](http://www.stack.nl/~dimitri/doxygen/)
- Make sure that option of builder `WITH_STYLE_CHECK` is on and the build could pass through the code style check.
- Unit test is needed for all codes.
- Pass through all unit tests.
Expand All @@ -26,7 +26,7 @@ Clone remote git to local:

## Create local branch

At present [Git stream branch model](http://nvie.com/posts/a-successful-git-branching-model/) is applied to Paddle to undergo task of development,test,release and maintenance.Please refer to [branch regulation of Paddle](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/releasing_process.md#paddle-分支规范) about details。
At present [Git stream branch model](http://nvie.com/posts/a-successful-git-branching-model/) is applied to Paddle to undergo task of development,test,release and maintenance.Please refer to [branch regulation of Paddle](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/others/releasing_process.md) about details。

All development tasks of feature and bug fix should be finished in a new branch which is extended from `develop` branch.

Expand Down Expand Up @@ -80,7 +80,7 @@ no changes added to commit (use "git add" and/or "git commit -a")

It needs a variety of development tools to build PaddlePaddle source code and generate documentation. For convenience, our standard development procedure is to put these tools together into a Docker image,called *development mirror* , usually named as `paddle:latest-dev` or `paddle:[version tag]-dev`,such as `paddle:0.11.0-dev` . Then all that need `cmake && make` ,such as IDE configuration,are replaced by `docker run paddle:latest-dev` .

You need to bulid this development mirror under the root directory of source code directory tree
You need to bulid this development mirror under the root directory of source code directory tree

```bash
➜ docker build -t paddle:latest-dev .
Expand Down Expand Up @@ -110,7 +110,7 @@ Run all unit tests with following commands:
➜ docker run -it -v $(pwd):/paddle paddle:latest-dev bash -c "cd /paddle/build && ctest"
```

Please refer to [Installation and run with Docker](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/v2/build_and_install/docker_install_cn.rst) about more information of construction and test.
Please refer to [Installation and run with Docker](../../../beginners_guide/install/install_Docker.html) about more information of construction and test.

## Commit

Expand Down
5 changes: 1 addition & 4 deletions doc/fluid/advanced_usage/development/new_op/index_cn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,9 @@
新增operator
#############

- `如何写新的operator <../../../advanced_usage/development/new_op.html>`_ :介绍如何在 Fluid 中添加新的 Operator

- `op相关的一些注意事项 <../../../advanced_usage/development/op_notes.html>`_ :介绍op相关的一些注意事项
- `op相关的一些注意事项 <../../../advanced_usage/development/new_op/op_notes.html>`_ :介绍op相关的一些注意事项

.. toctree::
:hidden:

new_op_cn.md
op_notes.md
10 changes: 5 additions & 5 deletions doc/fluid/advanced_usage/development/new_op/op_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ Op的核心方法是Run,Run方法需要两方面的资源:数据资源和计

Fluid框架的设计理念是可以在多种设备及第三方库上运行,有些Op的实现可能会因为设备或者第三方库的不同而不同。为此,Fluid引入了OpKernel的方式,即一个Op可以有多个OpKernel,这类Op继承自`OperatorWithKernel`,这类Op的代表是conv,conv_op的OpKerne有:`GemmConvKernel``CUDNNConvOpKernel``ConvMKLDNNOpKernel`,且每个OpKernel都有double和float两种数据类型。不需要OpKernel的代表有`WhileOp`等。

Operator继承关系图:
Operator继承关系图:
![op_inheritance_relation_diagram](../../pics/op_inheritance_relation_diagram.png)

进一步了解可参考:[multi_devices](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/design/multi_devices)[scope](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/scope.md)[Developer's_Guide_to_Paddle_Fluid](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md)
进一步了解可参考:[multi_devices](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/design/multi_devices)[scope](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/scope.md)[Developer's_Guide_to_Paddle_Fluid](https://github.com/PaddlePaddle/FluidDoc/blob/release/1.2/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md)

### 2.Op的注册逻辑
每个Operator的注册项包括:
Expand Down Expand Up @@ -75,15 +75,15 @@ Operator继承关系图:

通常Op注释时需要调用REGISTER_OPERATOR,即:
```
REGISTER_OPERATOR(op_type,
REGISTER_OPERATOR(op_type,
OperatorBase
op_maker_and_checker_maker,
op_grad_opmaker,
op_infer_var_shape,
op_infer_var_type)
```

**注意:**
**注意:**

1. 对于所有Op,前三个参数是必须的,op_type指明op的名字,OperatorBase是该Op的对象,op_maker_and_checker_maker是op的maker和op中attr的checker。
2. 如果该Op有反向,则必须要有op_grad_opmaker,因为在backward会根据正向的Op中获取反向Op的Maker。
Expand Down Expand Up @@ -139,7 +139,7 @@ The following device operations are asynchronous with respect to the host:
- 如果数据传输是从GPU端到非页锁定的CPU端,数据传输将是同步,即使调用的是异步拷贝操作。
- 如果数据传输时从CPU端到CPU端,数据传输将是同步的,即使调用的是异步拷贝操作。

更多内容可参考:[Asynchronous Concurrent Execution](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#asynchronous-concurrent-execution)[API synchronization behavior](https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html#api-sync-behavior)
更多内容可参考:[Asynchronous Concurrent Execution](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#asynchronous-concurrent-execution)[API synchronization behavior](https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html#api-sync-behavior)

## Op性能优化
### 1.第三方库的选择
Expand Down
6 changes: 3 additions & 3 deletions doc/fluid/advanced_usage/development/new_op/op_notes_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The Fluid framework is designed to run on a variety of devices and third-party l
Operator inheritance diagram:
![op_inheritance_relation_diagram](../../pics/op_inheritance_relation_diagram.png)

For further information, please refer to: [multi_devices](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/design/multi_devices) , [scope](https://github.com/PaddlePaddle/FluidDoc/Blob/develop/doc/fluid/design/concepts/scope.md) , [Developer's_Guide_to_Paddle_Fluid](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md )
For further information, please refer to: [multi_devices](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/design/multi_devices) , [scope](https://github.com/PaddlePaddle/FluidDoc/Blob/develop/doc/fluid/design/concepts/scope.md) , [Developer's_Guide_to_Paddle_Fluid](https://github.com/PaddlePaddle/FluidDoc/blob/release/1.2/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md)

### 2.Op's registration logic
The registration entries for each Operator include:
Expand Down Expand Up @@ -67,7 +67,7 @@ The registration entries for each Operator include:
<tr>
<td>OpCreator </td>
<td>Functor </td>
<td>Create a new OperatorBase for each call </td>
<td>Create a new OperatorBase for each call </td>
<td>Call at runtime </td>
</tr>
</tbody>
Expand Down Expand Up @@ -150,7 +150,7 @@ The calculation speed of Op is related to the amount of data input. For some Op,

Since the call of CUDA Kernel has a certain overhead, multiple calls of the CUDA Kernel in Op may affect the execution speed of Op. For example, the previous sequence_expand_op contains many CUDA Kernels. Usually, these CUDA Kernels process a small amount of data, so frequent calls to such Kernels will affect the calculation speed of Op. In this case, it is better to combine these small CUDA Kernels into one. This idea is used in the optimization of the sequence_expand_op procedure (related PR[#9289](https://github.com/PaddlePaddle/Paddle/pull/9289)). The optimized sequence_expand_op is about twice as fast as the previous implementation, the relevant experiments are introduced in the PR ([#9289](https://github.com/PaddlePaddle/Paddle/pull/9289)).

Reduce the number of copy and sync operations between the CPU and the GPU. For example, the fetch operation will update the model parameters and get a loss after each iteration, and the copy of the data from the GPU to the Non-Pinned-Memory CPU is synchronous, so frequent fetching for multiple parameters will reduce the model training speed.
Reduce the number of copy and sync operations between the CPU and the GPU. For example, the fetch operation will update the model parameters and get a loss after each iteration, and the copy of the data from the GPU to the Non-Pinned-Memory CPU is synchronous, so frequent fetching for multiple parameters will reduce the model training speed.

## Op numerical stability
### 1. Some Ops have numerical stability problems
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ gperftool主要支持以下四个功能:
- heap-profiling using tcmalloc
- CPU profiler

Paddle也提供了基于gperftool的[CPU性能分析教程](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/howto/optimization/cpu_profiling_cn.md)
Paddle也提供了基于gperftool的[CPU性能分析教程](./cpu_profiling_cn.html)

对于堆内存的分析,主要用到thread-caching malloc和heap-profiling using tcmalloc。

Expand All @@ -29,7 +29,7 @@ Paddle也提供了基于gperftool的[CPU性能分析教程](https://github.com/P
- 安装google-perftools

```
apt-get install libunwind-dev
apt-get install libunwind-dev
apt-get install google-perftools
```

Expand Down Expand Up @@ -73,17 +73,17 @@ env HEAPPROFILE="./perf_log/test.log" HEAP_PROFILE_ALLOCATION_INTERVAL=209715200
pprof --pdf python test.log.0012.heap
```
上述命令会生成一个profile00x.pdf的文件,可以直接打开,例如:[memory_cpu_allocator](https://github.com/jacquesqiao/Paddle/blob/bd2ea0e1f84bb6522a66d44a072598153634cade/doc/fluid/howto/optimization/memory_cpu_allocator.pdf)。从下图可以看出,在CPU版本fluid的运行过程中,分配存储最多的模块式CPUAllocator. 而别的模块相对而言分配内存较少,所以被忽略了,这对于分配内存泄漏是很不方便的,因为泄漏是一个缓慢的过程,在这种图中是无法看到的。

![result](https://user-images.githubusercontent.com/3048612/40964027-a54033e4-68dc-11e8-836a-144910c4bb8c.png)

- Diff模式。可以对两个时刻的heap做diff,把一些内存分配没有发生变化的模块去掉,而把增量部分显示出来。
```
pprof --pdf --base test.log.0010.heap python test.log.1045.heap
```
生成的结果为:[`memory_leak_protobuf`](https://github.com/jacquesqiao/Paddle/blob/bd2ea0e1f84bb6522a66d44a072598153634cade/doc/fluid/howto/optimization/memory_leak_protobuf.pdf)

从图中可以看出:ProgramDesc这个结构,在两个版本之间增长了200MB+,所以这里有很大的内存泄漏的可能性,最终结果也确实证明是这里造成了泄漏。

![result](https://user-images.githubusercontent.com/3048612/40964057-b434d5e4-68dc-11e8-894b-8ab62bcf26c2.png)
![result](https://user-images.githubusercontent.com/3048612/40964063-b7dbee44-68dc-11e8-9719-da279f86477f.png)

Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ gperftool mainly supports four functions:
- heap-profiling using tcmalloc
- CPU profiler

Paddle also provides a [tutorial on CPU performance analysis](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/howto/optimization/cpu_profiling_en.md) based on gperftool.
Paddle also provides a [tutorial on CPU performance analysis](./cpu_profiling_en.html) based on gperftool.

For the analysis for heap, we mainly use thread-caching malloc and heap-profiling using tcmalloc.

Expand All @@ -29,7 +29,7 @@ This tutorial is based on the Docker development environment paddlepaddle/paddle
- Install google-perftools

```
apt-get install libunwind-dev
apt-get install libunwind-dev
apt-get install google-perftools
```

Expand Down Expand Up @@ -74,15 +74,15 @@ As the program runs, a lot of files will be generated in the perf_log folder as
```
The command above will generate a file of profile00x.pdf, which can be opened directly, for example, [memory_cpu_allocator](https://github.com/jacquesqiao/Paddle/blob/bd2ea0e1f84bb6522a66d44a072598153634cade/doc/fluid/howto/optimization/memory_cpu_allocator.pdf). As demonstrated in the chart below, during the running of the CPU version fluid, the module CPUAllocator is allocated with most memory. Other modules are allocated with relatively less memory, so they are ignored. It is very inconvenient for inspecting memory leak for memory leak is a chronic process which cannot be inspected in this picture.
![result](https://user-images.githubusercontent.com/3048612/40964027-a54033e4-68dc-11e8-836a-144910c4bb8c.png)

- Diff mode. You can do diff on the heap at two moments, which removes some modules whose memory allocation has not changed, and displays the incremental part.
```
pprof --pdf --base test.log.0010.heap python test.log.1045.heap
```
The generated result: [`memory_leak_protobuf`](https://github.com/jacquesqiao/Paddle/blob/bd2ea0e1f84bb6522a66d44a072598153634cade/doc/fluid/howto/optimization/memory_leak_protobuf.pdf)

As shown from the figure: The structure of ProgramDesc has increased by 200MB+ between the two versions, so there is a large possibility that memory leak happens here, and the final result does prove a leak here.

![result](https://user-images.githubusercontent.com/3048612/40964057-b434d5e4-68dc-11e8-894b-8ab62bcf26c2.png)
![result](https://user-images.githubusercontent.com/3048612/40964063-b7dbee44-68dc-11e8-9719-da279f86477f.png)

2 changes: 0 additions & 2 deletions doc/fluid/advanced_usage/development/profiling/index_cn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,7 @@

本模块介绍 Fluid 使用过程中的调优方法,包括:

- `如何进行基准测试 <benchmark.html>`_:介绍如何选择基准模型,从而验证模型的精度和性能
- `CPU性能调优 <cpu_profiling_cn.html>`_:介绍如何使用 cProfile 包、yep库、Google perftools 进行性能分析与调优
- `GPU性能调优 <gpu_profiling_cn.html>`_:介绍如何使用 Fluid 内置的定时工具、nvprof 或 nvvp 进行性能分析和调优
- `堆内存分析和优化 <host_memory_profiling_cn.html>`_:介绍如何使用 gperftool 进行堆内存分析和优化,以解决内存泄漏的问题
- `Timeline工具简介 <timeline_cn.html>`_ :介绍如何使用 Timeline 工具进行性能分析和调优

Expand Down
8 changes: 4 additions & 4 deletions doc/fluid/advanced_usage/development/profiling/timeline_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,23 +27,23 @@ python Paddle/tools/timeline.py --profile_path=/tmp/profile --timeline_path=time

1. 打开chrome浏览器,访问<chrome://tracing/>,用`load`按钮来加载生成的`timeline`文件。

![chrome tracing](./tracing.jpeg)
![chrome tracing](../tracing.jpeg)

1. 结果如下图所示,可以放到来查看timetime的细节信息。

![chrome timeline](./timeline.jpeg)

## 分布式使用
一般来说,分布式的训练程序都会有两种程序:pserver和trainer。我们提供了把pserver和trainer的profile日志用timeline来显示的方式。
一般来说,分布式的训练程序都会有两种程序:pserver和trainer。我们提供了把pserver和trainer的profile日志用timeline来显示的方式。

1. trainer打开方式与[本地使用](#local)部分的第1步相同

1. pserver可以通过加两个环境变量打开profile,例如:
```
FLAGS_rpc_server_profile_period=10 FLAGS_rpc_server_profile_path=./tmp/pserver python train.py
```

3. 把pserver和trainer的profile文件生成一个timeline文件,例如:
3. 把pserver和trainer的profile文件生成一个timeline文件,例如:
```
python /paddle/tools/timeline.py
--profile_path trainer0=local_profile_10_pass0_0,trainer1=local_profile_10_pass0_1,pserver0=./pserver_0,pserver1=./pserver_1
Expand Down
Loading

0 comments on commit 8e68fa0

Please sign in to comment.