Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLama convert docs #6754

Merged
merged 13 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## [无参数]torch.distributed.is_initialized

### [torch.distributed.is_initialized](https://pytorch.org/docs/stable/distributed.html#torch.distributed.is_initialized)

```python
torch.distributed.is_initialized()
```

### [paddle.distributed.is_initialized](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/distributed/is_initialized_cn.html#is-initialized)

```python
paddle.distributed.is_initialized()
```

两者功能一致,无参数。

### 转写示例
```python
# PyTorch 写法
torch.distributed.is_initialized()

# Paddle 写法
paddle.distributed.is_initialized()
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
## [ 仅参数名不一致 ] torch.inference_mode

### [torch.inference_mode](https://pytorch.org/docs/stable/generated/torch.no_grad.html)

```python
torch.inference_mode(mode=True)
```

### [paddle.no_grad](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/no_grad_cn.html)

```python
paddle.no_grad(func=None)
```

inference_mode 会额外禁用视图跟踪和版本计数器,提高推理性能,其他功能一致。此外 mode 参数额外支持 bool 类型,具体如下:

### 参数映射

| PyTorch | PaddlePaddle | 备注 |
| ----------- | ------------ | ----------------------------------------------------------------------------------------- |
| mode | func | mode 为函数时,仅参数名不同,作为上下文管理器使用时,mode=True 可忽略该参数,mode=False 时,应移除之(替换为空装饰器) |
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved

### 转写示例
```python
# PyTorch 写法
@torch.inference_mode()
def doubler(x):
return x * 2

# Paddle 写法
@paddle.no_grad()
def doubler(x):
return x * 2

# PyTorch 写法
@torch.inference_mode(False)
def doubler(x):
return x * 2

# Paddle 写法
@paddle_aux.empty_decorator
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved
def doubler(x):
return x * 2

```
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
## [ 输入参数用法不一致 ] torch.set_default_tensor_type

### [torch.set_default_tensor_type](https://pytorch.org/docs/stable/generated/torch.set_default_tensor_type.html#torch-set-default-tensor-type)

```python
torch.set_default_tensor_type(d)
```

### [paddle.set_default_dtype](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/set_default_dtype_cn.html)

```python
paddle.set_default_dtype(d)
```

两者功能一致,支持的参数类型相同,但参数用法不一致,需将 d 转换为 paddle 可识别类型,具体如下:
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved

### 参数映射

| PyTorch | PaddlePaddle | 备注 |
| ----------- | ------------ | -------------------------------------------------------------------------------------- |
| d | d | 全局默认数据类型,均支持所有浮点类型|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

输入数据类型不一致,要写: torch的输入数据类型,paddle的输入数据类型,需要转写

这个是不是有了,直接在原有基础上修改吧,这个 改成:[ 输入参数类型不一致 ] 分类更合理


### 转写示例
```python
# pytorch
torch.set_default_tensor_type(torch.HalfTensor)
torch.set_default_tensor_type('torch.HalfTensor')
torch.set_default_tensor_type(torch.FloatTensor)
torch.set_default_tensor_type('torch.FloatTensor')
torch.set_default_tensor_type(torch.DoubleTensor)
torch.set_default_tensor_type('torch.DoubleTensor')

# paddle
paddle.set_default_dtype('float16')
paddle.set_default_dtype('float16')
paddle.set_default_dtype('float32')
paddle.set_default_dtype('float32')
paddle.set_default_dtype('float64')
paddle.set_default_dtype('float64')
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## [无参数]fairscale.nn.model_parallel.initialize.get_model_parallel_rank

### [fairscale.nn.model_parallel.initialize.get_model_parallel_rank](https://github.com/facebookresearch/fairscale/blob/164cc0f3170b4a3951dd84dda29c3e1504ac4d6e/fairscale/nn/model_parallel/initialize.py#L155)

```python
fairscale.nn.model_parallel.initialize.get_model_parallel_rank()
```

### [paddle.distributed.fleet.base.topology._HYBRID_PARALLEL_GROUP.get_model_parallel_rank](https://github.com/PaddlePaddle/Paddle/blob/ddac1b431483ddc0f1ee600e799aa31fc0a75961/python/paddle/distributed/fleet/base/topology.py#L463)

```python
paddle.distributed.fleet.base.topology._HYBRID_PARALLEL_GROUP.get_model_parallel_rank()
```

两者功能一致,均无参数。

### 转写示例
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved
```python
# PyTorch 写法
fairscale.nn.model_parallel.initialize.get_model_parallel_size()

# Paddle 写法
paddle.distributed.fleet.base.topology._HYBRID_PARALLEL_GROUP.get_model_parallel_rank()
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## [无参数]fairscale.nn.model_parallel.initialize.get_model_parallel_world_size

### [fairscale.nn.model_parallel.initialize.get_model_parallel_world_size](https://github.com/facebookresearch/fairscale/blob/164cc0f3170b4a3951dd84dda29c3e1504ac4d6e/fairscale/nn/model_parallel/initialize.py#L150)

```python
fairscale.nn.model_parallel.initialize.get_model_parallel_size()
```

### [paddle.distributed.fleet.base.topology._HYBRID_PARALLEL_GROUP._mp_degree](https://github.com/PaddlePaddle/Paddle/blob/ddac1b431483ddc0f1ee600e799aa31fc0a75961/python/paddle/distributed/fleet/base/topology.py#L185)

```python
paddle.distributed.fleet.base.topology._HYBRID_PARALLEL_GROUP._mp_degree
```
两者功能一致,均无参数。
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved

### 转写示例
```python
# PyTorch 写法
fairscale.nn.model_parallel.initialize.get_model_parallel_size()

# Paddle 写法
paddle.distributed.fleet.base.topology._HYBRID_PARALLEL_GROUP._mp_degree
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## [组合替代实现]fairscale.nn.model_parallel.initialize.initialize_model_parallel

### [fairscale.nn.model_parallel.initialize.initialize_model_parallel](https://github.com/facebookresearch/fairscale/blob/164cc0f3170b4a3951dd84dda29c3e1504ac4d6e/fairscale/nn/model_parallel/initialize.py#L41)

```python
fairscale.nn.model_parallel.initialize.initialize_model_parallel()
```

对模型并行设置进行初始化; Paddle 无此 API,需要组合实现。

### 参数映射
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved

| fairscale | PaddlePaddle | 备注 |
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved
| --------- | ------------ | -------- |
| model_parallel_size_ | | 模型并行规模 |
| pipeline_length | | 流水线并行规模 |
| model_parallel_backend | | 模型并行通信后端 |
| pipeline_backend | | 流水线并行通信后端 |
| ddp_backend | | 数据并行通信后端|

### 转写示例

```python
# Pytorch 写法
fairscale.nn.model_parallel.initialize.initialize_model_parallel(model_parallel_size_=model_parallel_size_,pipeline_length=pipeline_length)

# Paddle 写法
world_size = paddle.distributed.get_world_size()
rank = paddle.distributed.get_rank()
model_parallel_size = int(min(world_size,model_parallel_size_))
data_parallel_size = int(world_size/ (model_parallel_size * pipeline_length))
Strategy = paddle.distributed.fleet.DistributedStrategy()
Strategy_dict = dict()
Strategy_dict["dp_degree"] = data_parallel_size
Strategy_dict["mp_degree"] = model_parallel_size
Strategy_dict["pp_degree"] = pipeline_length
Strategy.hybrid_configs = Strategy_dict
paddle.distributed.fleet.init(is_collective=True, strategy=Strategy)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
## [组合替代实现]fairscale.nn.model_parallel.initialize.model_parallel_is_initialized

### [fairscale.nn.model_parallel.initialize.model_parallel_is_initialized](https://github.com/facebookresearch/fairscale/blob/164cc0f3170b4a3951dd84dda29c3e1504ac4d6e/fairscale/nn/model_parallel/initialize.py#L119)

```python
fairscale.nn.model_parallel.initialize.model_parallel_is_initialized()
```

返回模型并行初始化设置是否完成; Paddle 无此 API,需要组合实现。

### 转写示例

```python
# Pytorch 写法
fairscale.nn.model_parallel.initialize.model_parallel_is_initialized()

# Paddle 写法
paddle.distributed.fleet.base.topology._HYBRID_PARALLEL_GROUP is not None
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
## [torch 参数更多]fairscale.nn.model_parallel.layers.ColumnParallelLinear

### [fairscale.nn.model_parallel.layers.ColumnParallelLinear](https://github.com/facebookresearch/fairscale/blob/164cc0f3170b4a3951dd84dda29c3e1504ac4d6e/fairscale/nn/model_parallel/layers.py#L218)

```python
fairscale.nn.model_parallel.initialize.ColumnParallelLinear(in_features,out_features,bias,gather_output,init_method,stride,keep_master_weight_for_test)
```
### [paddle.distributed.meta_parallel.parallel_layers.mp_layers.ColumnParallelLinear](https://github.com/PaddlePaddle/Paddle/blob/016766cc89fabc10181453ce70b701dd8ed019f6/python/paddle/distributed/fleet/layers/mpu/mp_layers.py#L153)

```python
paddle.distributed.meta_parallel.parallel_layers.mp_layers.ColumnParallelLinear(in_features,out_features,weight_attr,has_bias,gather_output,fuse_matmul_bias,mp_group,name)
```

两者功能大体一致,torch 的参数更多。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

大体一致? 可以明确写下 哪些点不一致,哪些点暂不确定是否一致

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对参数映射已进行补充


### 参数映射

| fairscale | PaddlePaddle | 备注 |
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved
| --------- | ------------ | -------- |
| in_features | in_features| 输入特征数 |
| out_features |out_features |输出特征数|
| bias |has_bias | 是否增加 bias |
| gather_output |gather_output | 是否对每个 rank 的输出 allgather |
| init_method | | 参数初始化方法|
| |weight_attr | 网络层参数属性|
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved
| stride | | 线性层滑动步长 |
| keep_master_weight_for_test | | 返回主参数用于测试 |
| |fuse_matmul_bias | 是否融合矩阵乘和加 bias 操作 |
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved
| | mp_group| 模型并行组|
| | name| 网络层名称|

### 转写示例
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

转写示例 需要写明:对哪些参数进行转写,参考模板 https://github.com/PaddlePaddle/docs/blob/develop/docs/guides/model_convert/convert_from_pytorch/api_difference/pytorch_api_mapping_format_cn.md

#### var1: 简述功能

#### var2: 简述功能

Copy link
Contributor Author

@xuxinyi389 xuxinyi389 Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch多的参数从用途来看均可直接删除。故无需 转写示例


```python
# Pytorch 写法
fairscale.nn.model_parallel.initialize.ColumnParallelLinear(in_features=in_features,
out_features=out_features,bias=False,gather_out=False)

# Paddle 写法
paddle.distributed.meta_parallel.parallel_layers.mp_layers.ColumnParallelLinear(in_features=in_features,
out_features=in_features,has_bias=False, gather_output=False)

# Pytorch 写法
fairscale.nn.model_parallel.initialize.ColumnParallelLinear(in_features=in_features,
out_features=out_features)

# Paddle 写法
paddle.distributed.meta_parallel.parallel_layers.mp_layers.ColumnParallelLinear(in_features=in_features,
out_features=in_features,has_bias=True)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
## [torch 参数更多]fairscale.nn.model_parallel.layers.ParallelEmbedding

### [fairscale.nn.model_parallel.layers.ParallelEmbedding](https://github.com/facebookresearch/fairscale/blob/164cc0f3170b4a3951dd84dda29c3e1504ac4d6e/fairscale/nn/model_parallel/layers.py#L152)

```python
fairscale.nn.model_parallel.initialize.ParallelEmbedding(num_embeddings,embedding_dim,padding_idx,max_norm,norm_type,scale_grad_by_freq,sparse,init_method,keep_master_weight_for_test)
```
### [paddle.distributed.meta_parallel.parallel_layers.mp_layers.VocabParallelEmbedding](https://github.com/PaddlePaddle/Paddle/blob/016766cc89fabc10181453ce70b701dd8ed019f6/python/paddle/distributed/fleet/layers/mpu/mp_layers.py#L37)

```python
paddle.distributed.meta_parallel.parallel_layers.mp_layers.VocabParallelEmbedding(num_embeddings,embedding_dim,weight_attr,mp_group,name)
```

两者功能大体一致,但内部实现细节不一样,ParallelEmbedding 的切分方向沿着 embedding 方向,VocabParallelEmbedding 的切分方向沿着 vocab(词汇表)方向。

### 参数映射
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved

| fairscale | PaddlePaddle | 备注 |
| --------- | ------------ | -------- |
| num_embeddings | num_embeddings|词汇表大小 |
| embedding_dim |embedding_dim |embedding 的维度大小|
| padding_idx | | 填充下标处的数据对梯度无贡献 |
| max_norm | | 范数大于 maxnorm 的数值被设置为 maxnorm|
| norm_type | | 设置 p 范数|
| sparse | | 是否为稀疏向量 |
| scale_grad_by_freq| | 是否根据 batch 内单词的频数的倒数缩放梯度 |
| init_method | | 参数初始化方法|
| keep_master_weight_for_test | | 返回主参数用于测试 |
| | mp_group| 模型并行组|
| | name| 网络层名称|


### 转写示例

```python
# Pytorch 写法
fairscale.nn.model_parallel.initialize.ParallelEmbedding(num_embeddings=num_embeddings,
embedding_dim=embedding_dim)

# Paddle 写法
paddle.distributed.meta_parallel.parallel_layers.mp_layers.VocabParallelEmbedding(num_embeddings=num_embeddings,
embedding_dim=embedding_dim)

```
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
## [torch 参数更多]fairscale.nn.model_parallel.layers.RowParallelLinear

### [fairscale.nn.model_parallel.layers.RowParallelLinear](https://github.com/facebookresearch/fairscale/blob/164cc0f3170b4a3951dd84dda29c3e1504ac4d6e/fairscale/nn/model_parallel/layers.py#L299)

```python
fairscale.nn.model_parallel.initialize.RowParallelLinear(in_features,out_features,bias,input_is_parallel,init_method,stride,keep_master_weight_for_test)
```
### [paddle.distributed.meta_parallel.parallel_layers.mp_layers.RowParallelLinear](https://github.com/PaddlePaddle/Paddle/blob/016766cc89fabc10181453ce70b701dd8ed019f6/python/paddle/distributed/fleet/layers/mpu/mp_layers.py#L291)

```python
paddle.distributed.meta_parallel.parallel_layers.mp_layers.RowParallelLinear(in_features,out_features,weight_attr,has_bias,input_is_parallel,fuse_matmul_bias,mp_group,name)
```

两者功能大体一致,参数不一致。
xuxinyi389 marked this conversation as resolved.
Show resolved Hide resolved

### 参数映射

| fairscale | PaddlePaddle | 备注 |
| --------- | ------------ | -------- |
| in_features | in_features| 输入特征数 |
| out_features |out_features |输出特征数|
| bias |has_bias | 是否增加 bias |
| input_is_parallel |input_is_parallel | 输入是否在 GPUs 上进行过分割,如果是就不再分割 |
| init_method | | 参数初始化方法|
| |weight_attr | 网络层参数属性|
| stride | | 线性层滑动步长 |
| keep_master_weight_for_test | | 返回主参数用于测试 |
| |fuse_matmul_bias | 是否融合 matmul 和 bias 操作 |
| | mp_group| 模型并行组 |
| | name| 网络层名称 |

### 转写示例

```python
# Pytorch 写法
fairscale.nn.model_parallel.initialize.RowParallelLinear(in_features=in_features,
out_features=out_features,bias=False,input_is_parallel=False)

# Paddle 写法
paddle.distributed.meta_parallel.parallel_layers.mp_layers.RowParallelLinear(in_features=in_features,
out_features=in_features,has_bias=False, input_is_parallel=False)

# Pytorch 写法
fairscale.nn.model_parallel.initialize.RowParallelLinear(in_features=in_features,
out_features=out_features)

# Paddle 写法
paddle.distributed.meta_parallel.parallel_layers.mp_layers.RowParallelLinear(in_features=in_features,
out_features=in_features,has_bias=True)
```
Loading