-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add new_layer_cn doc #1029
add new_layer_cn doc #1029
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
赞!相当细致!
实现新的网络层 | ||
================ | ||
|
||
这份教程指导你在PaddlePaddle中实现一个自定义的网络层。在这里我们使用全连接层作为例子来指导你完成实现新网络层需要的几个步骤。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
来指导你完成实现
==>
来演示实现一个
显得客气一点儿?
|
||
这份教程指导你在PaddlePaddle中实现一个自定义的网络层。在这里我们使用全连接层作为例子来指导你完成实现新网络层需要的几个步骤。 | ||
|
||
- 推导该层前向和后向传递的方程。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
既然是几个步骤,可以把 -
改成 1.
?
COMMAND test_FCGrad) | ||
|
||
|
||
实现python封装 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python ==> Python,专有名词首字母大写
|
||
- 推导该层前向和后向传递的方程。 | ||
- 实现该层的C++类。 | ||
- 写梯度检测的测试单元,以保证梯度的正确计算。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
写梯度检测的测试单元,以保证梯度的正确计算。
写
-> 增加
,测试单元
-> 单元测试
?
- 推导该层前向和后向传递的方程。 | ||
- 实现该层的C++类。 | ||
- 写梯度检测的测试单元,以保证梯度的正确计算。 | ||
- 实现该层的python封装。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
实现该层的python封装。
改成封装python接口
?
实现新的网络层 | ||
================ | ||
|
||
这份教程指导你在PaddlePaddle中实现一个自定义的网络层。在这里我们使用全连接层作为例子来指导你完成实现新网络层需要的几个步骤。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
几个步骤
-> 四个步骤
?
|
||
这份教程指导你在PaddlePaddle中实现一个自定义的网络层。在这里我们使用全连接层作为例子来指导你完成实现新网络层需要的几个步骤。 | ||
|
||
- 推导该层前向和后向传递的方程。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用方程
合适,还是公式
合适呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里保持方程吧,方程、等式、表达式概念一致,公式感觉有个“公”,这里的equation不见得general
|
||
首先我们需要推导该网络层的*前向传播*和*后向传播*的方程。前向传播给定输入,计算输出。后向传播给定输出的梯度,计算输入和参数的梯度。 | ||
|
||
下图是一个全链接层的示意图。在全连接层中,每个输出节点都连接到所有的输入节点上。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下图是一个全链接层的示意图。
错别字。全链接层
-> 全连接层
:scale: 60 % | ||
|
||
一个网络层的前向传播部分把输入转化为相应的输出。 | ||
全连接层以一个维度为 :math:`D_i` 稠密的向量作为输入。其用一个尺度为 :math:`D_i \times D_o` 的变换矩阵 :math:`W` 把 :math:`x` 映射到一个维度为 :math:`D_o` 的向量,并在其上再加上维度为 :math:`D_o` 的偏置向量 :math:`b` 。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
全连接层以一个维度为 :math:
D_i
稠密的向量作为输入。其用一个尺度为 :math:D_i \times D_o
的变换矩阵 :math:W
把 :math:x
映射到一个维度为 :math:D_o
的向量,并在其上再加上维度为 :math:D_o
的偏置向量 :math:b
。
稠密的向量作为输入。其用
-> 的稠密向量作为输入,使用
,是同一个主语吧,或者把其
改成其它代词。
并在其上再
-> 并在乘积结果上
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是同一主语,不过中间有句号,就又写了一个主语。可以改成这样,用逗号。
|
||
其中 :math:`f(.)` 是一个非线性的*激活方程*,例如sigmoid, tanh,以及Relu。 | ||
|
||
变换矩阵 :math:`W` 和偏置向量 :math:`b` 是该网络层的*参数*。一个网络层的参数是在*反向传播*时被训练的。反向传播对所有的参数和输入都计算输出函数的梯度。优化器则用链式法则来对每个参数计算损失函数的梯度。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backward pass computes the gradients of the output function with respect to all parameters and inputs.
反向传播对所有的参数和输入都计算输出函数的梯度。
这句话不是很理解。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这句话是有问题,英文我就不改了,中文应是:反向传播根据output的梯度,分别计算出每个参数的梯度,以及input的梯度。
|
||
变换矩阵 :math:`W` 和偏置向量 :math:`b` 是该网络层的*参数*。一个网络层的参数是在*反向传播*时被训练的。反向传播对所有的参数和输入都计算输出函数的梯度。优化器则用链式法则来对每个参数计算损失函数的梯度。 | ||
|
||
假设我们的损失函数是 :math:`c(y)` ,那么 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
假设我们的损失函数是
去掉我们的
?
|
||
\frac{\partial y}{\partial z} = \frac{\partial f(z)}{\partial z} | ||
|
||
我们的base layer类可以自动计算上面的导数。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们的base layer类可以自动计算上面的导数。
我们的
-> PaddlePaddle的
- 非零数字的个数,仅对稀疏数据有效。 | ||
- 稀疏数据的格式,仅对稀疏数据有效。 | ||
+ 对每个输入,都需要调用一次 :code:`config.layerConfig.add_inputs();` 。 | ||
+ 调用 :code:`testLayerGrad` 来做梯度检查。它包含下面的参数。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下面的参数
-> 以下参数
+ 对每个输入,都需要调用一次 :code:`config.layerConfig.add_inputs();` 。 | ||
+ 调用 :code:`testLayerGrad` 来做梯度检查。它包含下面的参数。 | ||
- 层和输入的配置。(例子中是 :code:`config` ) | ||
- 输入的类型。(例子中是 :code:`fc` ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
输入的类型
-> 网络层的类型
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
原文有误
+ 调用 :code:`testLayerGrad` 来做梯度检查。它包含下面的参数。 | ||
- 层和输入的配置。(例子中是 :code:`config` ) | ||
- 输入的类型。(例子中是 :code:`fc` ) | ||
- 梯度检查的批次大小。(例子中是100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
梯度检查的批次大小
-> 输入数据的批次大小
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
梯度检查的输入数据的批次大小
} | ||
} | ||
|
||
如果你要为了测试而增加新的文件,例如 :code:`paddle/gserver/tests/testFCGrad.cpp` ,你需要把该文件加入 :code:`paddle/gserver/tests/CMakeLists.txt` 中。下面给出了一个例子。当你执行命令 :code:`make tests` 时,所有的单侧都会被执行一次。注意,有些层可能需要高精度来保证梯度检查单侧正确执行。你需要在配置cmake时将 :code:`WITH_DOUBLE` 设置为 `ON` 。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
保证梯度检查单侧正确执行
-> 保证梯度检查单测正确执行
- 所有的Python封装都使用 :code:`@config_layer('fc')` 这样的装饰器。网络层的标识符为 :code:`fc` 。 | ||
- 实现构造函数 :code:`__init__` 。 | ||
- 它首先调用基构造函数 :code:`super(FCLayer, self).__init__(name, 'fc', size, inputs=inputs, **xargs)` 。 :code:`FCLayer` 是Python封装的类名。 :code:`fc` 是网络层的标识符。为了封装能够正确工作,这些名字必须要写对。 | ||
- 之后,计算转换矩阵的大小和格式(是否稀疏)。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
转换矩阵
-> 变换矩阵
|
||
.. math:: | ||
|
||
y = f(W^T x + b) | ||
|
||
其中 :math:`f(.)` 是一个非线性的*激活方程*,例如sigmoid, tanh,以及Relu。 | ||
|
||
变换矩阵 :math:`W` 和偏置向量 :math:`b` 是该网络层的*参数*。一个网络层的参数是在*反向传播*时被训练的。反向传播对所有的参数和输入都计算输出函数的梯度。优化器则用链式法则来对每个参数计算损失函数的梯度。 | ||
变换矩阵 :math:`W` 和偏置向量 :math:`b` 是该网络层的*参数*。一个网络层的参数是在*反向传播*时被训练的。反向传根据输出的梯度,分别计算每个参数的梯度,以及输入的梯度。优化器则用链式法则来对每个参数计算损失函数的梯度。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
反向传根据
-> 反向传播根据
,掉了一个播
字
@wangkuiyi 这个PR,您还有什么意见么 |
remove useless parameters, and remove redefination of pretrain model, and update layer replacement for considering each layer independently. Co-authored-by: Zeyu Chen <chenzeyu01@baidu.com>
resolve #833