Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/dynamic net doc #4

31 changes: 22 additions & 9 deletions doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 动态神经网络的实现

动态网络是目前神经网络框架的前沿课题。动态神经网络的优势解决了普通神经网络框架的一个重要问题,**神经网络的定义和计算是分离的**。即普通神经网络框架的计算步骤是,先定义一个神经网络的计算图,再使用计算引擎计算这个计算图。而动态神经网络的特点是,直接对每个操作求值,隐式的定义计算图,从而再对这个隐式的计算图反向传播。
动态网络是目前神经网络框架的前沿课题。动态神经网络的优势解决了普通神经网络框架的一个重要问题,**神经网络的定义和计算是分离的**。即静态神经网络框架的计算步骤是,先定义一个神经网络的计算图,再使用计算引擎计算这个计算图。而动态神经网络的特点是,直接对每个操作求值,隐式的定义计算图,从而再对这个隐式的计算图反向传播。

常见的使用方式为:

Expand All @@ -12,7 +12,7 @@ x.fill([0.058, 0.548, ...])
y = paddle.dyn.data(type=Integer(10))
y.fill(9)

hidden = paddle.dyn.fc(input=y, size=200)
hidden = paddle.dyn.fc(input=x, size=200)

# You can use hidden.npvalue() to get this layer's value now.

Expand All @@ -31,18 +31,31 @@ parameters.update()

## 动态神经网络解决的问题

动态神经网络只有神经网络的计算步骤,而隐藏了神经网络的定义步骤。他解决的问题是:
动态神经网络只有神经网络的计算步骤,而隐藏了神经网络的定义步骤,用户可以为每一个sample或者batch定义一个不同的网络。相对于静态神经网络而言,动态神经网络解决了以下几个问题:

* 可以任意的在计算过程中添加非线性的操作,例如`if`。并且对于不同的数据,神经网络的计算图可以不同。例如 树形神经网络
* 可以任意的在计算过程中添加复杂的控制逻辑,例如迭代,递归,条件选择等,这些控制逻辑都可以由host language(C++/Python)来实现。
* 可以支持更复杂的数据类型,并且对于不同的数据,神经网络的计算图可以不同。
* 动态神经网络的执行过程就是其定义过程,用户可以对神经网络中的参数,中间结果等信息直接求值,方便debug的过程。

// TODO(qijun): Complete this docs

TBD

## 动态神经网络的实现思路

TBD
动态神经网络计算图的定义是隐式的,其设计哲学可以参考一些autograd库(例如https://github.com/HIPS/autograd)。具体实现思路如下:


1. 对于每一个sample,用户使用layer的组合来定义神经网络结构。每个sample都拥有一个graph结构来记录该sample的计算图。
2. graph中包含每一层layer的信息,包括输入数据来源,该层layer进行的操作,输出数据大小等。新连接上的layer的相关信息会被持续追加到graph中。
3. layer的求值操作是lazy的,直到用户显式的调用value()方法,graph中记录的计算图才会被execute engine真正执行,计算得到该层layer的输出结果。通常情况下执行forward()操作时会对网络进行求值。
4. 用户可以在组合layer的时候加入控制逻辑,被选择的分支信息也会记录到graph中。
5. 在进行backward()操作时,graph的execute engine会根据记录的计算图执行求导操作,计算梯度。




## 动态神经网络对神经网络框架的要求

TBD
* 最核心的要求就是构建计算图的过程要足够轻量,后端使用C++来实现,并且考虑设计特定的内存/显存 管理策略。前端的Python wrapper也要足够小,可以直接使用后端C++提供的接口。

* 考虑到layer的求值是lazy的,可以使用表达式模板对计算过程进行优化。

* 考虑对不同大小数据/不同网络结构 组batch进行训练。在动态网络中,每一个sample都拥有自己的计算图,相比于静态网络,在GPU上进行并行操作是比较困难的。
1 change: 1 addition & 0 deletions doc/templates/conf.py.cn.in
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ extensions = [
'sphinx.ext.napoleon',
'sphinx.ext.graphviz'
]
mathjax_path="https://cdn.bootcss.com/mathjax/2.7.0/MathJax.js"
table_styling_embed_css = True

autodoc_member_order = 'bysource'
Expand Down
2 changes: 1 addition & 1 deletion paddle/gserver/layers/HierarchicalSigmoidLayer.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ namespace paddle {
* | |- 5
* |
* |-*- 0
* |- 1
* |- 1
* @endcode
*
* where * indicates an internal node, and each leaf node represents a class.
Expand Down
6 changes: 3 additions & 3 deletions paddle/scripts/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ docker build -t paddle:dev --build-arg UBUNTU_MIRROR=mirror://mirrors.ubuntu.com
Given the development image `paddle:dev`, the following command builds PaddlePaddle from the source tree on the development computer (host):

```bash
docker run -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TEST=OFF" -e "RUN_TEST=OFF" paddle:dev
docker run --rm -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TEST=OFF" -e "RUN_TEST=OFF" paddle:dev
```

This command mounts the source directory on the host into `/paddle` in the container, so the default entry point of `paddle:dev`, `build.sh`, could build the source code with possible local changes. When it writes to `/paddle/build` in the container, it writes to `$PWD/build` on the host indeed.
Expand All @@ -110,7 +110,7 @@ Users can specify the following Docker build arguments with either "ON" or "OFF"
- `WITH_AVX`: ***Required***. Set to "OFF" prevents from generating AVX instructions. If you don't know what is AVX, you might want to set "ON".
- `WITH_TEST`: ***Optional, default OFF***. Build unit tests binaries. Once you've built the unit tests, you can run these test manually by the following command:
```bash
docker run -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" paddle:dev sh -c "cd /paddle/build; make coverall"
docker run --rm -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" paddle:dev sh -c "cd /paddle/build; make coverall"
```
- `RUN_TEST`: ***Optional, default OFF***. Run unit tests after building. You can't run unit tests without building it.

Expand All @@ -129,7 +129,7 @@ This production image is minimal -- it includes binary `paddle`, the shared libr
Again the development happens on the host. Suppose that we have a simple application program in `a.py`, we can test and run it using the production image:

```bash
docker run -it -v $PWD:/work paddle /work/a.py
docker run --rm -it -v $PWD:/work paddle /work/a.py
```

But this works only if all dependencies of `a.py` are in the production image. If this is not the case, we need to build a new Docker image from the production image and with more dependencies installs.
Expand Down
13 changes: 13 additions & 0 deletions python/paddle/v2/dataset/wmt14.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,10 @@
wmt14 dataset
"""
import tarfile
import gzip

from paddle.v2.dataset.common import download
from paddle.v2.parameters import Parameters

__all__ = ['train', 'test', 'build_dict']

Expand All @@ -25,6 +27,9 @@
# this is a small set of data for test. The original data is too large and will be add later.
URL_TRAIN = 'http://paddlepaddle.cdn.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz'
MD5_TRAIN = 'a755315dd01c2c35bde29a744ede23a6'
# this is the pretrained model, whose bleu = 26.92
URL_MODEL = 'http://paddlepaddle.bj.bcebos.com/demo/wmt_14/wmt14_model.tar.gz'
MD5_MODEL = '6b097d23e15654608c6f74923e975535'

START = "<s>"
END = "<e>"
Expand Down Expand Up @@ -103,5 +108,13 @@ def test(dict_size):
download(URL_TRAIN, 'wmt14', MD5_TRAIN), 'test/test', dict_size)


def model():
tar_file = download(URL_MODEL, 'wmt14', MD5_MODEL)
with gzip.open(tar_file, 'r') as f:
parameters = Parameters.from_tar(f)
return parameters


def fetch():
download(URL_TRAIN, 'wmt14', MD5_TRAIN)
download(URL_MODEL, 'wmt14', MD5_MODEL)
6 changes: 6 additions & 0 deletions python/paddle/v2/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ def __init__(self, cost, parameters, update_equation):
self.__topology__ = topology
self.__parameters__ = parameters
self.__topology_in_proto__ = topology.proto()

# In local mode, disable sparse_remote_update.
for param in self.__topology_in_proto__.parameters:
if param.sparse_remote_update:
param.sparse_remote_update = False

self.__data_types__ = topology.data_type()
gm = api.GradientMachine.createFromConfigProto(
self.__topology_in_proto__, api.CREATE_MODE_NORMAL,
Expand Down