-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon 6th No.8】NO.8 为 Paddle 新增 FeatureAlphaDropout API #913
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,345 @@ | ||
# paddle.nn.FeatureAlphaDropout 设计文档 | ||
|
||
| API名称 | paddle.nn.FeatureAlphaDropout | | ||
| ------------ | ------------------------------------------------ | | ||
| 提交作者 | megemini | | ||
| 提交时间 | 2024-06-04 | | ||
| 版本号 | V1.0 | | ||
| 依赖飞桨版本 | develop版本 | | ||
| 文件名 | 20240604_api_design_for_feature_alpha_dropout.md | | ||
|
||
|
||
# 一、概述 | ||
|
||
## 1、相关背景 | ||
|
||
论文 [Self-Normalizing Neural Networks](https://arxiv.org/pdf/1706.02515) 提出了一种新的 Dropout 方法,即 Alpha-dropout,它专门设计用来与自归一化激活函数(如 SELU,即 Scaled Exponential Linear Unit)一起使用,以保持网络的自归一化属性。当 Dropout 的目标为整个 channel 时,则为 `FeatureAlphaDropout` 算法。 | ||
|
||
> 参考赛题:[NO.8 为 Paddle 新增 FeatureAlphaDropout API](https://github.com/PaddlePaddle/community/blob/master/hackathon/hackathon_6th/%E3%80%90Hackathon%206th%E3%80%91%E5%BC%80%E6%BA%90%E8%B4%A1%E7%8C%AE%E4%B8%AA%E4%BA%BA%E6%8C%91%E6%88%98%E8%B5%9B%E6%A1%86%E6%9E%B6%E5%BC%80%E5%8F%91%E4%BB%BB%E5%8A%A1%E5%90%88%E9%9B%86.md#no8-%E4%B8%BA-paddle-%E6%96%B0%E5%A2%9E-featurealphadropout-api) | ||
|
||
|
||
## 2、功能目标 | ||
|
||
实现 `paddle.nn.FeatureAlphaDropout` 作为独立的 Layer 调用。 | ||
实现 `paddle.nn.functional.feature_alpha_dropout` 作为独立的函数调用。 | ||
|
||
## 3、意义 | ||
|
||
丰富 Paddle 的 Dropout 相关的 Layer 与 API。 | ||
|
||
# 二、飞桨现状 | ||
|
||
目前 Paddle 已经实现了 `paddle.nn.functional.alpha_dropout` 与 `paddle.nn.AlphaDropout` ,暂无针对整个 channel 的 `feature_alpha_dropout` Dropout 接口。 | ||
|
||
# 三、业内方案调研 | ||
|
||
PyTorch 实现了相关接口 | ||
|
||
- [torch.nn.functional.feature_alpha_dropout](https://pytorch.org/docs/stable/generated/torch.nn.functional.feature_alpha_dropout.html#torch.nn.functional.feature_alpha_dropout) | ||
- [torch.nn.FeatureAlphaDropout](https://pytorch.org/docs/stable/generated/torch.nn.FeatureAlphaDropout.html#torch.nn.FeatureAlphaDropout) | ||
|
||
具体实现逻辑为在 `aten/src/ATen/native/Dropout.cpp` | ||
|
||
``` c++ | ||
template<bool feature_dropout, bool alpha_dropout, bool inplace, typename T> | ||
Ctype<inplace> _dropout_impl(T& input, double p, bool train) { | ||
TORCH_CHECK(p >= 0 && p <= 1, "dropout probability has to be between 0 and 1, but got ", p); | ||
if (p == 0 || !train || input.sym_numel() == 0) { | ||
return input; | ||
} | ||
|
||
if (p == 1) { | ||
return multiply<inplace>(input, at::zeros({}, input.options())); | ||
} | ||
|
||
at::Tensor b; // used for alpha_dropout only | ||
auto noise = feature_dropout ? make_feature_noise(input) : at::empty_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT); | ||
noise.bernoulli_(1 - p); | ||
if (alpha_dropout) { | ||
constexpr double alpha = 1.7580993408473766; | ||
double a = 1. / std::sqrt((alpha * alpha * p + 1) * (1 - p)); | ||
b = noise.add(-1).mul_(alpha * a).add_(alpha * a * p); | ||
noise.mul_(a); | ||
} else { | ||
noise.div_(1 - p); | ||
} | ||
|
||
if (!alpha_dropout) { | ||
return multiply<inplace>(input, noise); | ||
} else { | ||
return multiply<inplace>(input, noise).add_(b); | ||
} | ||
} | ||
``` | ||
|
||
可以看到,由于 `feature alpha dropout` 为 `alpha dropout` 的一种特殊情况,因此, PyTorch 将两者统一实现在以上的同一个函数中,两者的唯一区别为,`feature alpha dropout` 的 `noise` 是通过 `make_feature_noise` 生成的,而不是 `alpha dropout` 中的一个与输入相同形状的空张量: | ||
|
||
``` c++ | ||
Tensor make_feature_noise(const Tensor& input) { | ||
auto input_sizes = input.sym_sizes(); | ||
TORCH_CHECK(input.dim() >= 2, "Feature dropout requires at least 2 dimensions in the input"); | ||
c10::SymDimVector sizes; | ||
sizes.reserve(input.dim()); | ||
sizes.push_back(input_sizes[0]); | ||
sizes.push_back(input_sizes[1]); | ||
for (C10_UNUSED const auto i : c10::irange(2, input.dim())) { | ||
sizes.push_back(1); | ||
} | ||
return input.new_empty_symint(sizes); | ||
} | ||
``` | ||
|
||
从上面的 `make_feature_noise` 可以看到,此处的空张量为至少 `2` 维的张量。取输入维度的前两个,合并维度 `1` 组成,如: | ||
|
||
- 输入的维度为 `[2, 3, 4]` | ||
- 输入的空张量为 `[2, 3, 1]` | ||
|
||
又如: | ||
|
||
- 输入的维度为 `[2, 3, 4, 5, 6, 7]` | ||
- 输入的空张量为 `[2, 3, 1, 1, 1, 1]` | ||
|
||
也就是说,此处只保留前两个维度,后面的所有 channel 都相同,也就是实现了 `feature` 的 Dropout 操作。 | ||
|
||
# 四、对比分析 | ||
|
||
Paddle 目前实现了 `alpha_dropout` ,`python/paddle/nn/functional/common.py` 中: | ||
|
||
``` python | ||
def alpha_dropout(x, p=0.5, training=True, name=None): | ||
if not isinstance(p, (float, int)): | ||
raise TypeError("p argument should be a float or int") | ||
if p < 0 or p > 1: | ||
raise ValueError("p argument should between 0 and 1") | ||
|
||
if not in_dynamic_mode(): | ||
check_variable_and_dtype( | ||
x, 'x', ['float16', 'uint16', 'float32', 'float64'], 'alpha_dropout' | ||
) | ||
|
||
if training: | ||
if p == 1: | ||
return paddle.scale(x, scale=0.0) | ||
# get transformation params | ||
alpha = 1.6732632423543772848170429916717 | ||
scale = 1.0507009873554804934193349852946 | ||
alpha_p = -alpha * scale | ||
a = ((1 - p) * (1 + p * alpha_p**2)) ** -0.5 | ||
b = -a * alpha_p * p | ||
|
||
dtype = x.dtype | ||
input_shape = x.shape | ||
|
||
# get mask | ||
random_tensor = paddle.uniform( | ||
input_shape, dtype='float32', min=0.0, max=1.0 | ||
) | ||
p = full(shape=input_shape, fill_value=p, dtype='float32') | ||
keep_mask = paddle.greater_equal(random_tensor, p) | ||
keep_mask = paddle.cast(keep_mask, dtype) | ||
drop_mask = paddle.subtract( | ||
full(shape=input_shape, fill_value=1.0, dtype=dtype), keep_mask | ||
) | ||
|
||
# apply mask | ||
b = full(shape=input_shape, fill_value=b, dtype=dtype) | ||
y = paddle.add( | ||
paddle.multiply(x, keep_mask), | ||
paddle.scale(drop_mask, scale=alpha_p), | ||
) | ||
res = paddle.add(paddle.scale(y, scale=a), b, name=name) | ||
return res | ||
else: # test | ||
return x | ||
``` | ||
|
||
其计算逻辑与 PyTorch 一致,但由于没有对于 `feature` 单独处理,因此,只实现了基础的 `alpha_dropout` 函数。 | ||
|
||
# 五、设计思路与实现方案 | ||
|
||
由于 `feature_alpha_dropout` 只是 `alpha_dropout` 的一种特例,因此,可以复用 Paddle 中的 `alpha_dropout` 函数进行实现: | ||
|
||
- 实现 `paddle.nn.functional.feature_alpha_dropout` 作为独立的函数调用。 | ||
- 实现 `paddle.nn.FeatureAlphaDropout` 调用上面的函数,作为独立的 Layer 调用。 | ||
|
||
为了复用 `alpha_dropout` 函数,这里可以抽取一个统一的函数 `_feature_alpha_dropout_impl`: | ||
|
||
``` python | ||
def _feature_alpha_dropout_impl( | ||
x: paddle.Tensor, | ||
feature_dropout: bool, | ||
p: float | int, | ||
training: bool = True, | ||
name: str | None = None, | ||
) -> paddle.Tensor: | ||
|
||
``` | ||
|
||
然后,在 `alpha_dropout` 与 `feature_alpha_dropout` 中调用 `_feature_alpha_dropout_impl`: | ||
|
||
``` python | ||
def alpha_dropout(x, p=0.5, training=True, name=None): | ||
return _feature_alpha_dropout_impl( | ||
x, feature_dropout=False, p=p, training=training, name=name | ||
) | ||
|
||
|
||
def feature_alpha_dropout(x, p=0.5, training=True, name=None): | ||
return _feature_alpha_dropout_impl( | ||
x, feature_dropout=True, p=p, training=training, name=name | ||
) | ||
``` | ||
|
||
其唯一区别为 `feature_dropout` 的参数不同。 | ||
|
||
## 命名与参数设计 | ||
|
||
### 作为函数调用 | ||
|
||
``` python | ||
def _feature_alpha_dropout_impl( | ||
x: paddle.Tensor, | ||
feature_dropout: bool, | ||
p: float | int, | ||
training: bool = True, | ||
name: str | None = None, | ||
) -> paddle.Tensor: | ||
|
||
def alpha_dropout(x, p=0.5, training=True, name=None): ... | ||
def feature_alpha_dropout(x, p=0.5, training=True, name=None): ... | ||
``` | ||
|
||
其中: | ||
|
||
- x (Tensor),输入的张量 | ||
- feature_droput (bool),是否为针对整个 Channel | ||
- p (float | int),设为零的概率 | ||
- training (bool),是否为训练阶段 | ||
- name (str),名称 | ||
|
||
### 作为 Layer | ||
|
||
``` python | ||
class FeatureAlphaDropout(Layer): ... | ||
``` | ||
|
||
## 底层OP设计 | ||
|
||
直接在 Python 层实现,不涉及底层算子。 | ||
|
||
## API实现方案 | ||
|
||
参考代码: | ||
|
||
``` python | ||
def _feature_alpha_dropout_impl( | ||
x: paddle.Tensor, | ||
feature_dropout: bool, | ||
p: float | int, | ||
training: bool = True, | ||
name: str | None = None, | ||
) -> paddle.Tensor: | ||
if not isinstance(p, (float, int)): | ||
raise TypeError("p argument should be a float or int") | ||
if p < 0 or p > 1: | ||
raise ValueError("p argument should between 0 and 1") | ||
|
||
if not in_dynamic_mode(): | ||
check_variable_and_dtype( | ||
x, 'x', ['float16', 'uint16', 'float32', 'float64'], 'alpha_dropout' | ||
) | ||
|
||
if training: | ||
if p == 1: | ||
return paddle.scale(x, scale=0.0) | ||
# get transformation params | ||
alpha = 1.6732632423543772848170429916717 | ||
scale = 1.0507009873554804934193349852946 | ||
alpha_p = -alpha * scale | ||
a = ((1 - p) * (1 + p * alpha_p**2)) ** -0.5 | ||
b = -a * alpha_p * p | ||
|
||
dtype = x.dtype | ||
|
||
# 此处是与原 `alpha_dropout` 唯一的不同之处 | ||
if not feature_dropout: | ||
input_shape = x.shape | ||
else: | ||
if x.ndim < 2: | ||
raise ValueError( | ||
'Feature alpha dropout needs at least 2D input.' | ||
) | ||
input_shape = list(x.shape[:2]) + [1] * len(x.shape[2:]) | ||
|
||
# get mask | ||
random_tensor = paddle.uniform( | ||
input_shape, dtype='float32', min=0.0, max=1.0 | ||
) | ||
p = full(shape=input_shape, fill_value=p, dtype='float32') | ||
keep_mask = paddle.greater_equal(random_tensor, p) | ||
keep_mask = paddle.cast(keep_mask, dtype) | ||
drop_mask = paddle.subtract( | ||
full(shape=input_shape, fill_value=1.0, dtype=dtype), keep_mask | ||
) | ||
|
||
# apply mask | ||
b = full(shape=input_shape, fill_value=b, dtype=dtype) | ||
y = paddle.add( | ||
paddle.multiply(x, keep_mask), | ||
paddle.scale(drop_mask, scale=alpha_p), | ||
) | ||
res = paddle.add(paddle.scale(y, scale=a), b, name=name) | ||
return res | ||
else: # test | ||
return x | ||
|
||
|
||
def alpha_dropout(x, p=0.5, training=True, name=None): | ||
return _feature_alpha_dropout_impl( | ||
x, feature_dropout=False, p=p, training=training, name=name | ||
) | ||
|
||
|
||
def feature_alpha_dropout(x, p=0.5, training=True, name=None): | ||
return _feature_alpha_dropout_impl( | ||
x, feature_dropout=True, p=p, training=training, name=name | ||
) | ||
|
||
``` | ||
|
||
与原 `alpha_dropout` 唯一不同的地方是 `input_shape` 的不同: | ||
|
||
``` python | ||
# 此处是与原 `alpha_dropout` 唯一的不同之处 | ||
if not feature_dropout: | ||
input_shape = x.shape | ||
else: | ||
if x.ndim < 2: | ||
raise ValueError( | ||
'Feature alpha dropout needs at least 2D input.' | ||
) | ||
input_shape = list(x.shape[:2]) + [1] * len(x.shape[2:]) | ||
|
||
``` | ||
|
||
# 六、测试和验收的考量 | ||
|
||
- **编程范式场景** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 增加测试文件存放路径 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. paddle 目前把 我这里也这样处理,可否? |
||
- 常规覆盖动态图 (和静态图) 的测试场景。 | ||
|
||
- **硬件场景** | ||
- 常规需覆盖 CPU、GPU 两种测试场景。 | ||
|
||
- **输入参数** | ||
- 常规覆盖默认参数,常用参数,错误参数。 | ||
- 常规数据类型 float16, float32 or float64 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 需要支持bfloat16 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 简单测试了一下,应该可以: In [14]: feature_alpha_dropout(i_paddle, 0.2, training=True)
Out[14]:
Tensor(shape=[4, 3, 3], dtype=bfloat16, place=Place(gpu:0), stop_gradient=True,
[[[ 0.80859375, 1.07812500, 0.83203125],
[ 0.73437500, 0.33398438, 0.96484375],
[ 0.72656250, 1.16406250, 0.54687500]],
[[ 0.71093750, 0.50781250, 1.15625000],
[-1.23437500, -1.23437500, -1.23437500],
[-1.23437500, -1.23437500, -1.23437500]],
[[ 0.34960938, 0.34375000, 0.98046875],
[ 0.78125000, 0.83203125, 0.98828125],
[ 0.64843750, 1.11718750, 0.37109375]],
[[-1.23437500, -1.23437500, -1.23437500],
[ 0.48437500, 0.31250000, 0.60156250],
[ 0.81640625, 1.00781250, 1.09375000]]]) |
||
|
||
# 七、可行性分析和排期规划 | ||
|
||
- 第一周,实现相关代码 | ||
- 第二周,测试用例和文档 | ||
- 第三周,Review | ||
|
||
# 八、影响面 | ||
|
||
丰富 paddle API,对其他模块没有影响 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里通过
make_feature_noise
应该是feature_dropout
,而非feature alpha dropout
,这块需要重新调研一下。另外feature alpha dropout
为alpha dropout
的一种特殊情况这点并不确切,_dropout_impl
包含了4种 dropout,alpha dropout
是 dropout 的特殊情况,你需要再看一下feature alpha dropout
的具体分支There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我之前描述的太笼统了,我是假定:只关心我们要实现
feature_alpha_dropout
,这个前提来描述的 ~不好意思,这里再展开说一下,帮忙看看对不对 ~
torch 的
feature_alpha_dropout
是这里的_dropout_impl
的一种特定实现:aten/src/ATen/native/Dropout.cpp
也就是说,
feature_alpha_dropout
可以看成是这里的dropout + alpha_dropout + feature_dropout
或者说是alpha_dropout(含有 dropout 的含义) + feature_dropout
。其次,feature_alpha_dropout
相对于alpha_dropout
相当于dropout
的范围
更具体 (alpha_dropout
是对每个点,feature_alpha_dropout
是针对某个通道),因此,我这里说feature_alpha_dropout
是alpha_dropout
的一种特殊情况 ~我这里说
的原因是,我这里只关心
feature alpha dropout
而不是feature dropout
,因此,看作feature alpha dropout 的 noise 是通过 make_feature_noise 生成的
~ 这里是我之前描述的不严谨 ~其实,我更关心的是,我文中的实现方案是否有问题?
文中的方案针对:
只涉及后面两种,直接复用 paddle 的
alpha_dropout
,只是input_shape
有区别 ~我觉得应该可以,不知道是否有问题?
谢谢!