Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 5th No.26】为 Paddle 新增 diagonal_scatter API #669

Merged
merged 10 commits into from
Oct 19, 2023
388 changes: 388 additions & 0 deletions rfcs/APIs/20230929_api_design_for_diagonal_scatter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,388 @@
# paddle.diagonal_scatter 设计文档

| API名称 | paddle.diagonal_scatter |
| ------------ | ------------------------------------------- |
| 提交作者 | DanGuge |
| 提交时间 | 2023-09-29 |
| 版本号 | V1.0 |
| 依赖飞桨版本 | develop |
| 文件名 | 20230929_api_design_for_diagonal_scatter.md |


# 一、概述
## 1、相关背景
丰富Paddle的Tensor相关API,支持更多样的tensor操作

## 2、功能目标
实现diagonal_scatter API,能够将y张量值嵌入到x张量中,同时y张量将会沿着指定的x的对角线元素分布,支持axis1和axis2两个维度:

- paddle.diagonal_scatter作为独立函数调用
- Tensor.diagonal_scatter作为Tensor的方法使用

## 3、意义
支持Paddle在张量上执行更细粒度的操作

# 二、飞桨现状
目前飞桨有类似功能的API实现,fill_diagonal_tensor,可以直接复用该API,或者通过API paddle.diagonal组合实现该功能

# 三、业内方案调研

## PyTorch
PyTorch中有API:`torch.diagonal_scatter(input, src, offset=0, dim1=0, dim2=1)`

在PyTorch中介绍如下,功能与Paddle的需求一致:

```
Embeds the values of the src tensor into input along the diagonal elements of input, with respect to dim1 and dim2.
This function returns a tensor with fresh storage; it does not return a view.
The argument offset controls which diagonal to consider:

* If offset = 0, it is the main diagonal.
* If offset > 0, it is above the main diagonal.
* If offset < 0, it is below the main diagonal.
```

### 实现方法

在实现方法上,Pytorch的API torch.diagonal_scatter基于C++ API组合实现了此功能,核心代码如下:

- diagonal_scatter的核心功能依赖于diagonal
- 首先,对intput进行深度拷贝,获得output
- 接着,调用diagonal函数获得output对应的diagonal切片
- 最后,将src填充到diagnoal切片中,返回output

```cpp
// pytorch/aten/src/ATen/native/TensorShape.cpp
at::Tensor diagonal_scatter(const at::Tensor& self, const at::Tensor& src, int64_t offset, int64_t dim1, int64_t dim2) {
// See Note [*_scatter ops preserve strides]
auto output = clone_preserve_strides(self);
auto slice = output.diagonal(offset, dim1, dim2);
TORCH_CHECK(slice.sizes() == src.sizes(), "expected src to have a size equal to the slice of self. src size = ", src.sizes(), ", slice size = ", slice.sizes());
slice.copy_(src);
return output;
}
```

```cpp
// pytorch/aten/src/ATen/native/TensorShape.cpp
Tensor diagonal(const Tensor& self, int64_t offset, int64_t dim1_, int64_t dim2_) {
int64_t nDims = self.dim();
int64_t dim1 = maybe_wrap_dim(dim1_, nDims);
int64_t dim2 = maybe_wrap_dim(dim2_, nDims);
TORCH_CHECK(dim1 != dim2, "diagonal dimensions cannot be identical ", dim1_, ", ", dim2_);
auto outnames = namedinference::compute_diagonal_outnames(self, dim1, dim2);
NoNamesGuard no_names_guard;

// NOLINTNEXTLINE(cppcoreguidelines-init-variables)
int64_t diag_size;
int64_t storage_offset = self.storage_offset();
// compute storage offset and size for the diagonal
// for positive values of offset (above the main diagonal)
// "leftmost columns" (along dim2) are dropped
// for negative values of offset (below the main diagonal)
// "topmost rows" (along dim1) are dropped.
// Note that we invert +/- in the second to absorb the negative
// sign in the offset.
if (offset >= 0) {
diag_size = std::max<int64_t>(std::min(self.size(dim1), self.size(dim2)-offset), 0);
} else {
diag_size = std::max<int64_t>(std::min(self.size(dim1)+offset, self.size(dim2)), 0);
}

// NumPy allows you to specify offsets "off the end"; let's just be careful not to
// set a ridiculous storage_offset in that case (technically it shouldn't matter
// because there are no elements in the tensor, but let's be kosher).
if (diag_size == 0) {
// skip
} else if (offset >= 0) {
storage_offset += offset * self.stride(dim2);
} else {
storage_offset -= offset * self.stride(dim1);
}

// construct new size and stride: we drop dim1 and dim2 (maximum first for not changing the index of the minimum)
// the new ("joint") dimension is appended to the end of the shape / stride to match numpy semantics
DimVector sizes(self.sizes().begin(), self.sizes().end());
DimVector strides(self.strides().begin(), self.strides().end());
sizes.erase(sizes.begin() + std::max(dim1, dim2));
strides.erase(strides.begin() + std::max(dim1, dim2));
sizes.erase(sizes.begin() + std::min(dim1, dim2));
strides.erase(strides.begin() + std::min(dim1, dim2));
sizes.push_back(diag_size);
strides.push_back(self.stride(dim1)+self.stride(dim2));

// return view with new parameters
auto result = self.as_strided(sizes, strides, storage_offset);

no_names_guard.reset();
namedinference::propagate_names_if_nonempty(result, outnames);
return result;
}
```

```cpp
// Clones a tensor by cloning the underlying storage that it came from,
// which allows us to replicate the exact strides/storage_offset in the cloned tensor.
// Note [*_scatter ops preserve strides]
// In order for functionalization to preserve stride correctness, the *_scatter
// operators that it calls must preserve the striding behavior of their inputs.
// Specifically, the output of *_scatter(base, mutated_view, ...)
// should have identical size/stride/storage_offset to "base".
at::Tensor clone_preserve_strides(const at::Tensor& self) {
TORCH_INTERNAL_ASSERT(self.has_storage());
// In cases where the input tensor has internal memory overlap, we cannot actually
// preserve the strides/storage_offset of the input tensor, because
// *_scatter ops will try to copy_() into the cloned tensor.
// However, this should **never** show up in functionalized user code;
// most aten ops that try to mutate a tensor with internal memory overlap would error anyway.
//
// The one place that this does come up is in autograd - if there's a select_scatter
// in the forward, then autograd will generate one for the backward.
// If the input to the select_scatter is grad_output, then this could be an expanded tensor
// with internal overlap.
if (at::has_internal_overlap(self) == at::MemOverlap::Yes) {
return self.clone();
}
auto dtype_size = self.dtype().itemsize();
auto nbytes = self.storage().sym_nbytes();
TORCH_INTERNAL_ASSERT(nbytes % dtype_size == 0);
auto numel = nbytes / dtype_size;
auto self_full_size = self.as_strided_symint({std::move(numel)}, {1}, 0);
auto clone = self_full_size.clone();
auto out = clone.as_strided_symint(self.sym_sizes(), self.sym_strides(), self.sym_storage_offset());
return out;
}
```

* 在PyTorch2.0的新编译器中,默认后端inductor也对diagonal_scatter进行了实现

```python
@register_lowering(aten.diagonal_scatter, type_promotion_kind=None)
def diagonal_scatter(input, src, offset: int = 0, dim1: int = 0, dim2: int = 1):
output = clone(input)
target = diagonal(output, offset, dim1, dim2)
mutate_to(target, src)
return output

@register_lowering(aten.diagonal, type_promotion_kind=None)
def diagonal(input, offset: int = 0, dim1: int = 0, dim2: int = 1):
original_shape = input.get_size()
num_dims = len(original_shape)
dim1 = canonicalize_dim(idx=dim1, rank=num_dims)
dim2 = canonicalize_dim(idx=dim2, rank=num_dims)

check(
dim1 != dim2, lambda: f"diagonal dimensions cannot be identical {dim1}, {dim2}"
)

offset_negative = V.graph.sizevars.evaluate_expr(sympy.Lt(offset, 0))
if offset_negative:
diag_size = max(min(original_shape[dim1] + offset, original_shape[dim2]), 0)
else:
diag_size = max(min(original_shape[dim1], original_shape[dim2] - offset), 0)

base_idx = (0, 0)
if offset_negative:
base_idx = (-offset, 0)
else:
base_idx = (0, offset)

sizes = [s for i, s in enumerate(original_shape) if i not in (dim1, dim2)]
sizes.append(diag_size)

def reindexer(idx):
diag_idx = idx[-1]
original_idx = [0] * len(original_shape)
cur_dim = 0
for d in range(num_dims):
if d == dim1:
original_idx[d] = diag_idx + base_idx[0]
elif d == dim2:
original_idx[d] = diag_idx + base_idx[1]
else:
original_idx[d] = idx[cur_dim]
cur_dim += 1

assert cur_dim == len(original_shape) - 2
return original_idx

return TensorBox(ir.GenericView.create(input, sizes, reindexer))
```

## TensorFlow

TensorFlow中没有diagonal_scatter API的实现,但是有核心函数tf.linalg.diag,可以通过组合来实现对应逻辑
## Numpy

Numpy中没有diagonal_scatter API的实现,但是有核心函数numpy.diagonal,可以通过组合来实现对应逻辑

## MindSpore

MindSpore存在对应实现,但是其表现与torch不一致,torch支持任意2D及以上维度矩阵的diagonal_scatter,mindspore只支持方阵的diagonal_scatter

* 因为mindspore的实现思路是构造一个与src矩阵一样的全1矩阵embed,再将embed构造为对角阵与input相乘,保留input的对角元素,如果input不是方阵的话,此处会报错

```python
@_primexpr
def _check_diagonal_scatter_shape(diag_shape, src_shape):
if diag_shape != src_shape:
raise ValueError(f"For diagonal_scatter, the shape of src should equal to the shape of input diagonal,"
f"but got src.shape {src_shape} and diagonal shape {diag_shape}.")


def diagonal_scatter(input, src, offset=0, dim1=0, dim2=1):
"""
`dim1` and `dim2` specify the two dimensions of `input`,
the elements in these two dimensions will be treated as elements of a matrix,
and `src` is embedded on the diagonal of the matrix.

Args:
input (Tensor): Input Tensor, whose dimension is larger than 1.
src (Tensor): The source Tensor to embed.
offset (int, optional): `offset` controls which diagonal to choose. Default: ``0`` .

- When `offset` is zero, the diagonal chosen is the main diagonal.
- When `offset` is a positive integer, the diagonal chosen is up the main diagonal.
- When `offset` is a negative integer, the diagonal chosen is down the main diagonal.

dim1 (int, optional): Axis to be used as the first axis of the 2-D
sub-arrays from which the diagonals should be taken. Default: ``0`` .
dim2 (int, optional): Axis to be used as the second axis of the 2-D
sub-arrays from which the diagonals should be taken. Default: ``1`` .

Returns:
Tensor after embedding, has the same shape and dtype as `input`.

Raises:
TypeError: If `input` or `src` is not a Tensor.
TypeError: If `offset` , `dim1` or `dim2` is not an integer.

Supported Platforms:
``Ascend`` ``GPU`` ``CPU``

Examples:
>>> import mindspore as ms
>>> input = ms.ops.zeros((3,3))
>>> src = ms.ops.ones(2)
>>> out = ms.ops.diagonal_scatter(input, src, 1, dim1=1, dim2=0)
>>> print(out)
[[0. 0. 0.]
[1. 0. 0.]
[0. 1. 0.]]
"""
_check_is_tensor("input", input, "diagonal_scatter")
_check_is_tensor("src", src, "diagonal_scatter")
_check_is_int(offset, "offset", "diagonal_scatter")
_check_is_int(dim1, "dim1", "diagonal_scatter")
_check_is_int(dim2, "dim2", "diagonal_scatter")
input_diag = input.diagonal(offset, dim1, dim2)
_check_diagonal_scatter_shape(input_diag.shape, src.shape)
embed = ones_like(src)
embed = ops.diag_embed(embed, offset, dim1, dim2)
embed = input * embed
src = ops.diag_embed(src, offset, dim1, dim2)
return input + src - embed
```

# 四、对比分析
- PyTorch是在C++ API基础上实现,使用Python调用C++对应的接口
- Tensorflow、Numpy中没有对应API的实现


# 五、设计思路与实现方案

## 命名与参数设计

paddle.diagonal_scatter

```python
paddle.diagonal_scatter(x, y, offset=0, axis1=0, axis2=1, name=None)
```
参数定义:

- `x(Tensor)`:输入张量,张量的维度至少为2维
DanGuge marked this conversation as resolved.
Show resolved Hide resolved
- `y(Tensor)`:嵌入张量,将会被嵌入到输入张量中
DanGuge marked this conversation as resolved.
Show resolved Hide resolved
- `offset(int, optional)`:偏移的对角线,默认值为0
- 偏移量为0,则嵌入对角线位置
- 偏移量大于0,则嵌入对角线上方
- 偏移量小于0,则嵌入对角线下方
- `axis1(int, optional)`:对角线的第一个维度,默认值为0
- `axis2(int, optional)`:对角线的第二个维度,默认值为1
- `name (str,optional)`:具体用法请参见 [Name](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_guides/low_level/program.html#api-guide-name),一般无需设置,默认值为 None


Tensor.diagonal_scatter

```python
Tensor.diagonal_scatter(y, offset=0, axis1=0, axis2=1, name=None)
```
参数定义:

- `y(Tensor)`:嵌入张量,将会被嵌入到输入张量中
DanGuge marked this conversation as resolved.
Show resolved Hide resolved
- `offset(int, optional)`:偏移的对角线,默认值为0
- 偏移量为0,则嵌入对角线位置
- 偏移量大于0,则嵌入对角线上方
- 偏移量小于0,则嵌入对角线下方
- `axis1(int, optional)`:对角线的第一个维度,默认值为0
- `axis2(int, optional)`:对角线的第二个维度,默认值为1
- `name (str,optional)`:具体用法请参见 [Name](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_guides/low_level/program.html#api-guide-name),一般无需设置,默认值为 None
## 底层OP设计

依赖已有的API(fill_diagonal_tensor或diagonal)实现,无需实现新的底层OP

## API实现方案
在python/paddle/tensor/manipulation.py中增加diagonal_scatter函数

- 动态图

1. clone输入张量,获得output张量
2. 调用diagonal方法,获得output张量对应位置上的张量视图diagonal_slice
3. 通过张量索引,将diagonal_slice中的元素都变为嵌入张量

- 静态图(无法仅通过修改python代码实现)

- 方案一:通过调用`fill_diagonal_tensor`实现对应逻辑,但是该方法只能在动态图中使用

- 方案二:调用`paddle.static.setitem`方法,覆盖diagonal_slice的元素,但是该方法在静态图中调用时,只会返回新的tensor,而不是直接把嵌入张量y的元素写入diagonal_slice的位置
- 如果想要调用`paddle.static.setitem(x, index, y)`,通过index来修改输入张量diagonal对应位置的元素,没有现成实现获得diagonal元素对应的index

- 方案三:类似torch实现方案,实现cpp算子逻辑

## 代码实现文件路径

函数API实现路径:python/paddle/tensor/manipulation.py

单元测试路径:test/lagacy_test/test_diagonal_scatter.py

# 六、测试和验收的考量

测试考虑以下case:

- 校验diagonal_scatter答案的正确性,对比torch.diagonal_scatter进行校验

- 检查参数的正确性,比如是否为支持的数据类型,是否在offset/axis1/axis2设置有误时进行报错

- 检查input的维度是否符合大于等于2个维度

- 检查input的slice和src的维度是否相等,这样才能进行覆盖

- 对多种offset/axis1/axis2设置的情况进行测试

# 七、可行性分析和排期规划

方案实施难度可控,工期上可以满足在当前版本周期内开发完成


# 八、影响面

为已有 API 的增强,对其他模块无影响。


# 名词解释


# 附件及参考资料

[torch.diagonal_scatter](https://pytorch.org/docs/stable/generated/torch.diagonal_scatter.html)