Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 6th No.8】NO.8 为 Paddle 新增 FeatureAlphaDropout API #913

Merged
merged 3 commits into from
Jun 13, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
345 changes: 345 additions & 0 deletions rfcs/APIs/20240604_api_design_for_feature_alpha_dropout.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,345 @@
# paddle.nn.FeatureAlphaDropout 设计文档

| API名称 | paddle.nn.FeatureAlphaDropout |
| ------------ | ------------------------------------------------ |
| 提交作者 | megemini |
| 提交时间 | 2024-06-04 |
| 版本号 | V1.0 |
| 依赖飞桨版本 | develop版本 |
| 文件名 | 20240604_api_design_for_feature_alpha_dropout.md |


# 一、概述

## 1、相关背景

论文 [Self-Normalizing Neural Networks](https://arxiv.org/pdf/1706.02515) 提出了一种新的 Dropout 方法,即 Alpha-dropout,它专门设计用来与自归一化激活函数(如 SELU,即 Scaled Exponential Linear Unit)一起使用,以保持网络的自归一化属性。当 Dropout 的目标为整个 channel 时,则为 `FeatureAlphaDropout` 算法。

> 参考赛题:[NO.8 为 Paddle 新增 FeatureAlphaDropout API](https://github.com/PaddlePaddle/community/blob/master/hackathon/hackathon_6th/%E3%80%90Hackathon%206th%E3%80%91%E5%BC%80%E6%BA%90%E8%B4%A1%E7%8C%AE%E4%B8%AA%E4%BA%BA%E6%8C%91%E6%88%98%E8%B5%9B%E6%A1%86%E6%9E%B6%E5%BC%80%E5%8F%91%E4%BB%BB%E5%8A%A1%E5%90%88%E9%9B%86.md#no8-%E4%B8%BA-paddle-%E6%96%B0%E5%A2%9E-featurealphadropout-api)


## 2、功能目标

实现 `paddle.nn.FeatureAlphaDropout` 作为独立的 Layer 调用。
实现 `paddle.nn.functional.feature_alpha_dropout` 作为独立的函数调用。

## 3、意义

丰富 Paddle 的 Dropout 相关的 Layer 与 API。

# 二、飞桨现状

目前 Paddle 已经实现了 `paddle.nn.functional.alpha_dropout` 与 `paddle.nn.AlphaDropout` ,暂无针对整个 channel 的 `feature_alpha_dropout` Dropout 接口。

# 三、业内方案调研

PyTorch 实现了相关接口

- [torch.nn.functional.feature_alpha_dropout](https://pytorch.org/docs/stable/generated/torch.nn.functional.feature_alpha_dropout.html#torch.nn.functional.feature_alpha_dropout)
- [torch.nn.FeatureAlphaDropout](https://pytorch.org/docs/stable/generated/torch.nn.FeatureAlphaDropout.html#torch.nn.FeatureAlphaDropout)

具体实现逻辑为在 `aten/src/ATen/native/Dropout.cpp`

``` c++
template<bool feature_dropout, bool alpha_dropout, bool inplace, typename T>
Ctype<inplace> _dropout_impl(T& input, double p, bool train) {
TORCH_CHECK(p >= 0 && p <= 1, "dropout probability has to be between 0 and 1, but got ", p);
if (p == 0 || !train || input.sym_numel() == 0) {
return input;
}

if (p == 1) {
return multiply<inplace>(input, at::zeros({}, input.options()));
}

at::Tensor b; // used for alpha_dropout only
auto noise = feature_dropout ? make_feature_noise(input) : at::empty_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
noise.bernoulli_(1 - p);
if (alpha_dropout) {
constexpr double alpha = 1.7580993408473766;
double a = 1. / std::sqrt((alpha * alpha * p + 1) * (1 - p));
b = noise.add(-1).mul_(alpha * a).add_(alpha * a * p);
noise.mul_(a);
} else {
noise.div_(1 - p);
}

if (!alpha_dropout) {
return multiply<inplace>(input, noise);
} else {
return multiply<inplace>(input, noise).add_(b);
}
}
```

可以看到,由于 `feature alpha dropout` 为 `alpha dropout` 的一种特殊情况,因此, PyTorch 将两者统一实现在以上的同一个函数中,两者的唯一区别为,`feature alpha dropout` 的 `noise` 是通过 `make_feature_noise` 生成的,而不是 `alpha dropout` 中的一个与输入相同形状的空张量:
Copy link
Contributor

@zxcd zxcd Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里通过 make_feature_noise 应该是 feature_dropout,而非 feature alpha dropout,这块需要重新调研一下。另外feature alpha dropoutalpha dropout 的一种特殊情况这点并不确切,_dropout_impl 包含了4种 dropout,alpha dropout 是 dropout 的特殊情况,你需要再看一下 feature alpha dropout 的具体分支

Copy link
Contributor Author

@megemini megemini Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我之前描述的太笼统了,我是假定:只关心我们要实现 feature_alpha_dropout ,这个前提来描述的 ~

不好意思,这里再展开说一下,帮忙看看对不对 ~

torch 的 feature_alpha_dropout 是这里的 _dropout_impl 的一种特定实现:

aten/src/ATen/native/Dropout.cpp

#define ALIAS_SPECIALIZATION(ALIAS_NAME, IS_FEATURE, IS_ALPHA)                      \
template <bool inplace, typename... Args>                                           \
Ctype<inplace> ALIAS_NAME(Args&&... args) {                                         \
  return _dropout_impl<IS_FEATURE, IS_ALPHA, inplace>(std::forward<Args>(args)...); \
}

ALIAS_SPECIALIZATION(_dropout,               false, false)
ALIAS_SPECIALIZATION(_feature_dropout,       true,  false)
ALIAS_SPECIALIZATION(_alpha_dropout,         false, true )
ALIAS_SPECIALIZATION(_feature_alpha_dropout, true,  true )

也就是说,feature_alpha_dropout 可以看成是这里的 dropout + alpha_dropout + feature_dropout 或者说是 alpha_dropout(含有 dropout 的含义) + feature_dropout。其次,feature_alpha_dropout 相对于 alpha_dropout 相当于 dropout范围 更具体 (alpha_dropout 是对每个点,feature_alpha_dropout 是针对某个通道),因此,我这里说 feature_alpha_dropoutalpha_dropout 的一种特殊情况 ~

我这里说

feature alpha dropoutnoise 是通过 make_feature_noise 生成的

的原因是,我这里只关心 feature alpha dropout 而不是 feature dropout,因此,看作 feature alpha dropout 的 noise 是通过 make_feature_noise 生成的 ~ 这里是我之前描述的不严谨 ~

其实,我更关心的是,我文中的实现方案是否有问题?

文中的方案针对:

ALIAS_SPECIALIZATION(_dropout,               false, false)
ALIAS_SPECIALIZATION(_feature_dropout,       true,  false)
ALIAS_SPECIALIZATION(_alpha_dropout,         false, true )
ALIAS_SPECIALIZATION(_feature_alpha_dropout, true,  true )

只涉及后面两种,直接复用 paddle 的 alpha_dropout ,只是 input_shape 有区别 ~

我觉得应该可以,不知道是否有问题?

谢谢!


``` c++
Tensor make_feature_noise(const Tensor& input) {
auto input_sizes = input.sym_sizes();
TORCH_CHECK(input.dim() >= 2, "Feature dropout requires at least 2 dimensions in the input");
c10::SymDimVector sizes;
sizes.reserve(input.dim());
sizes.push_back(input_sizes[0]);
sizes.push_back(input_sizes[1]);
for (C10_UNUSED const auto i : c10::irange(2, input.dim())) {
sizes.push_back(1);
}
return input.new_empty_symint(sizes);
}
```

从上面的 `make_feature_noise` 可以看到,此处的空张量为至少 `2` 维的张量。取输入维度的前两个,合并维度 `1` 组成,如:

- 输入的维度为 `[2, 3, 4]`
- 输入的空张量为 `[2, 3, 1]`

又如:

- 输入的维度为 `[2, 3, 4, 5, 6, 7]`
- 输入的空张量为 `[2, 3, 1, 1, 1, 1]`

也就是说,此处只保留前两个维度,后面的所有 channel 都相同,也就是实现了 `feature` 的 Dropout 操作。

# 四、对比分析

Paddle 目前实现了 `alpha_dropout` ,`python/paddle/nn/functional/common.py` 中:

``` python
def alpha_dropout(x, p=0.5, training=True, name=None):
if not isinstance(p, (float, int)):
raise TypeError("p argument should be a float or int")
if p < 0 or p > 1:
raise ValueError("p argument should between 0 and 1")

if not in_dynamic_mode():
check_variable_and_dtype(
x, 'x', ['float16', 'uint16', 'float32', 'float64'], 'alpha_dropout'
)

if training:
if p == 1:
return paddle.scale(x, scale=0.0)
# get transformation params
alpha = 1.6732632423543772848170429916717
scale = 1.0507009873554804934193349852946
alpha_p = -alpha * scale
a = ((1 - p) * (1 + p * alpha_p**2)) ** -0.5
b = -a * alpha_p * p

dtype = x.dtype
input_shape = x.shape

# get mask
random_tensor = paddle.uniform(
input_shape, dtype='float32', min=0.0, max=1.0
)
p = full(shape=input_shape, fill_value=p, dtype='float32')
keep_mask = paddle.greater_equal(random_tensor, p)
keep_mask = paddle.cast(keep_mask, dtype)
drop_mask = paddle.subtract(
full(shape=input_shape, fill_value=1.0, dtype=dtype), keep_mask
)

# apply mask
b = full(shape=input_shape, fill_value=b, dtype=dtype)
y = paddle.add(
paddle.multiply(x, keep_mask),
paddle.scale(drop_mask, scale=alpha_p),
)
res = paddle.add(paddle.scale(y, scale=a), b, name=name)
return res
else: # test
return x
```

其计算逻辑与 PyTorch 一致,但由于没有对于 `feature` 单独处理,因此,只实现了基础的 `alpha_dropout` 函数。

# 五、设计思路与实现方案

由于 `feature_alpha_dropout` 只是 `alpha_dropout` 的一种特例,因此,可以复用 Paddle 中的 `alpha_dropout` 函数进行实现:

- 实现 `paddle.nn.functional.feature_alpha_dropout` 作为独立的函数调用。
- 实现 `paddle.nn.FeatureAlphaDropout` 调用上面的函数,作为独立的 Layer 调用。

为了复用 `alpha_dropout` 函数,这里可以抽取一个统一的函数 `_feature_alpha_dropout_impl`:

``` python
def _feature_alpha_dropout_impl(
x: paddle.Tensor,
feature_dropout: bool,
p: float | int,
training: bool = True,
name: str | None = None,
) -> paddle.Tensor:

```

然后,在 `alpha_dropout` 与 `feature_alpha_dropout` 中调用 `_feature_alpha_dropout_impl`:

``` python
def alpha_dropout(x, p=0.5, training=True, name=None):
return _feature_alpha_dropout_impl(
x, feature_dropout=False, p=p, training=training, name=name
)


def feature_alpha_dropout(x, p=0.5, training=True, name=None):
return _feature_alpha_dropout_impl(
x, feature_dropout=True, p=p, training=training, name=name
)
```

其唯一区别为 `feature_dropout` 的参数不同。

## 命名与参数设计

### 作为函数调用

``` python
def _feature_alpha_dropout_impl(
x: paddle.Tensor,
feature_dropout: bool,
p: float | int,
training: bool = True,
name: str | None = None,
) -> paddle.Tensor:

def alpha_dropout(x, p=0.5, training=True, name=None): ...
def feature_alpha_dropout(x, p=0.5, training=True, name=None): ...
```

其中:

- x (Tensor),输入的张量
- feature_droput (bool),是否为针对整个 Channel
- p (float | int),设为零的概率
- training (bool),是否为训练阶段
- name (str),名称

### 作为 Layer

``` python
class FeatureAlphaDropout(Layer): ...
```

## 底层OP设计

直接在 Python 层实现,不涉及底层算子。

## API实现方案

参考代码:

``` python
def _feature_alpha_dropout_impl(
x: paddle.Tensor,
feature_dropout: bool,
p: float | int,
training: bool = True,
name: str | None = None,
) -> paddle.Tensor:
if not isinstance(p, (float, int)):
raise TypeError("p argument should be a float or int")
if p < 0 or p > 1:
raise ValueError("p argument should between 0 and 1")

if not in_dynamic_mode():
check_variable_and_dtype(
x, 'x', ['float16', 'uint16', 'float32', 'float64'], 'alpha_dropout'
)

if training:
if p == 1:
return paddle.scale(x, scale=0.0)
# get transformation params
alpha = 1.6732632423543772848170429916717
scale = 1.0507009873554804934193349852946
alpha_p = -alpha * scale
a = ((1 - p) * (1 + p * alpha_p**2)) ** -0.5
b = -a * alpha_p * p

dtype = x.dtype

# 此处是与原 `alpha_dropout` 唯一的不同之处
if not feature_dropout:
input_shape = x.shape
else:
if x.ndim < 2:
raise ValueError(
'Feature alpha dropout needs at least 2D input.'
)
input_shape = list(x.shape[:2]) + [1] * len(x.shape[2:])

# get mask
random_tensor = paddle.uniform(
input_shape, dtype='float32', min=0.0, max=1.0
)
p = full(shape=input_shape, fill_value=p, dtype='float32')
keep_mask = paddle.greater_equal(random_tensor, p)
keep_mask = paddle.cast(keep_mask, dtype)
drop_mask = paddle.subtract(
full(shape=input_shape, fill_value=1.0, dtype=dtype), keep_mask
)

# apply mask
b = full(shape=input_shape, fill_value=b, dtype=dtype)
y = paddle.add(
paddle.multiply(x, keep_mask),
paddle.scale(drop_mask, scale=alpha_p),
)
res = paddle.add(paddle.scale(y, scale=a), b, name=name)
return res
else: # test
return x


def alpha_dropout(x, p=0.5, training=True, name=None):
return _feature_alpha_dropout_impl(
x, feature_dropout=False, p=p, training=training, name=name
)


def feature_alpha_dropout(x, p=0.5, training=True, name=None):
return _feature_alpha_dropout_impl(
x, feature_dropout=True, p=p, training=training, name=name
)

```

与原 `alpha_dropout` 唯一不同的地方是 `input_shape` 的不同:

``` python
# 此处是与原 `alpha_dropout` 唯一的不同之处
if not feature_dropout:
input_shape = x.shape
else:
if x.ndim < 2:
raise ValueError(
'Feature alpha dropout needs at least 2D input.'
)
input_shape = list(x.shape[:2]) + [1] * len(x.shape[2:])

```

# 六、测试和验收的考量

- **编程范式场景**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加测试文件存放路径

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle 目前把 alpha_dropout 等 dropout 都放在 test/legacy_test/test_dropout_op.py 中 ~

我这里也这样处理,可否?

- 常规覆盖动态图 (和静态图) 的测试场景。

- **硬件场景**
- 常规需覆盖 CPU、GPU 两种测试场景。

- **输入参数**
- 常规覆盖默认参数,常用参数,错误参数。
- 常规数据类型 float16, float32 or float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要支持bfloat16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

简单测试了一下,应该可以:

In [14]: feature_alpha_dropout(i_paddle, 0.2, training=True)
Out[14]: 
Tensor(shape=[4, 3, 3], dtype=bfloat16, place=Place(gpu:0), stop_gradient=True,
       [[[ 0.80859375,  1.07812500,  0.83203125],
         [ 0.73437500,  0.33398438,  0.96484375],
         [ 0.72656250,  1.16406250,  0.54687500]],

        [[ 0.71093750,  0.50781250,  1.15625000],
         [-1.23437500, -1.23437500, -1.23437500],
         [-1.23437500, -1.23437500, -1.23437500]],

        [[ 0.34960938,  0.34375000,  0.98046875],
         [ 0.78125000,  0.83203125,  0.98828125],
         [ 0.64843750,  1.11718750,  0.37109375]],

        [[-1.23437500, -1.23437500, -1.23437500],
         [ 0.48437500,  0.31250000,  0.60156250],
         [ 0.81640625,  1.00781250,  1.09375000]]])


# 七、可行性分析和排期规划

- 第一周,实现相关代码
- 第二周,测试用例和文档
- 第三周,Review

# 八、影响面

丰富 paddle API,对其他模块没有影响