[AutoParallel] Add paddle.distributed.shard layer api #57604

chenwhql · 2023-09-21T11:33:21Z

PR types

New features

PR changes

APIs

Description

Pcard-73145

[AutoParallel] Add paddle.distributed.shard layer api

def shard_layer(
    layer: nn.Layer,
    process_mesh: dist.ProcessMesh,
    shard_fn: Callable = None,
    input_fn: Callable = None,
    output_fn: Callable = None,
) -> nn.Layer:
    """
    Converts all layer's parameters to DistTensor parameters according to
    the `shard_fn` specified. It could also control the conversion of input
    or output of the layer by specifying the `input_fn` and `output_fn`.
    (i.e. convert the input to `paddle.Tensor` with DistTensor, convert output
    back to `paddle.Tensor` with DenseTensor.)

    The `shard_fn` should have the following signature:

        def shard_fn(layer_name, layer, process_mesh) -> None

    The `input_fn` should have the following signature:

        def input_fn(inputs, process_mesh) -> list(paddle.Tensor)

    In general, the type of `input_fn` return value is paddle.Tensor with DistTensor.

    The `output_fn` should have the following signature:

        def output_fn(outputs, process_mesh) -> list(paddle.Tensor)

    In general, the type of `output_fn` return value is paddle.Tensor with DenseTensor.

    Args:
        layer (paddle.nn.Layer): The Layer object to be shard.
        process_mesh (paddle.distributed.ProcessMesh): The `ProcessMesh` information
            to be place the input `layer`.
        shard_fn (Callable): The function to shard layer parameters across
            the `process_mesh`. If not specified, by default we replicate
            all parameters of the layer across the `process_mesh`.
        input_fn (Callable): Specify how the input of the layer is sharded.
            The `input_fn` will be registered for the Layer as a `forward pre-hook`.
            By default we do not shard the input.
        output_fn (Callable): Specify how the output of the layer is sharded or
            convert it back to `paddle.Tensor` with DenseTensor.
            The `output_fn` will be registered for the Layer as `forward post-hook`.
            By default we do not shard or convert the output.

    Returns:
        Layer: A layer that contains parameters/buffers
            that are all `paddle.Tensor` with DistTensor

The cn doc: PaddlePaddle/docs#6201

… deve dtensor_from_fn first edition

… add_shard_layer_api 20230901 pull lastest code

… add_shard_layer_api

paddle-bot · 2023-09-21T11:33:27Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

GhostScreaming · 2023-09-25T02:36:37Z

python/paddle/distributed/auto_parallel/api.py

+
+def shard_layer(
+    layer: nn.Layer,
+    process_mesh: dist.ProcessMesh,


shard_layer需要支持跨mesh的情况吗？不支持的话，replicate_layer_params_and_buffers需不需要检查Tensor的ProcessMesh是否合法

理论上不应该限制是否跨mesh，不过当用户跨mesh shard_layer的时候，就不需要调用reshard了，这块在用户使用行为上还需要讨论下

不过，这里不需要检查mesh具体的状态，因为shard_layer是shard_tensor的包装，要检查的话在shard_tensor中检查更加合适，shard_tensor检查不合法就报错

LiYuRio · 2023-09-25T02:45:52Z

python/paddle/distributed/auto_parallel/api.py

+        layer: nn.Layer, mesh: dist.ProcessMesh
+    ) -> None:
+        for key, param in layer._parameters.items():
+            if param is not None and not param.is_dist():


如果参数不是dist_tensor，就把它变成replicated的，如果我们后面再执行机制里加了转replicated的操作，这里还需要吗

后面API都是inplace转replicated的话，这里应该可以删除，效果一致。

不过也不冲突，前置转其实逻辑上更顺畅一些，inplace转多少还是隐式更改了用户的输入

LiYuRio · 2023-09-25T02:49:54Z

test/auto_parallel/test_shard_layer_api.py

+
+class TestShardLayer(unittest.TestCase):
+    def setUp(self):
+        self.mesh = dist.ProcessMesh([0, 1], dim_names=["x"])


这个单测，没有用launch启，但是可以用两卡吗

这里没实际用mesh切分，下面的spec都是None，运行时tensor都是replicated，切分会跳过，所以单卡也能跑，主要是测了一遍流程

sunzhongkai588

.. code-block:: python 少了个空格

python/paddle/distributed/auto_parallel/api.py

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

GhostScreaming

LGTM

LiYuRio

LGTM

sunzhongkai588

lgtm for docs

zhiqiu

LGTM

XiaoguangHu01

LGTM

…7604) * def dtensor_from_fn first edition * dtensor_from_fn first edition * shard_layer api and utest(temporarily unavailable) * shard_layer API and unit test preliminary complete * complete the sample code modification according to ZhongKai's suggestion * modify according to the review * modify according to LiangGe's review * Not approved yet, temporarily stored * waiting for tensor to param * Complete the modifications according to Weihang's review * polish shard_layer api impl and doc * add shard layer test * rewrite unittest * revert needless change * polish doc * add unittest for coverage * add static branch and test * polish en doc * polish test details * verify doc test demo * Update python/paddle/distributed/auto_parallel/api.py Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> --------- Co-authored-by: yangxiaoyu14 <yangxiaoyu14@baidu.com> Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

yangxiaoyu14 and others added 21 commits August 23, 2023 02:43

def dtensor_from_fn first edition

bfca775

dtensor_from_fn first edition

d20a032

Merge branch 'develop' of https://github.com/yangxiaoyu14/Paddle into…

9070579

… deve dtensor_from_fn first edition

shard_layer api and utest(temporarily unavailable)

9f0fbe8

merge conflict

45f61c9

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

06424d1

… add_shard_layer_api 20230901 pull lastest code

shard_layer API and unit test preliminary complete

c9cafaa

complete the sample code modification according to ZhongKai's suggestion

9253ab2

modify according to the review

42f0c0e

modify according to LiangGe's review

0e43b0e

Not approved yet, temporarily stored

e480ea8

Not approved yet, temporarily store

a8f69fb

waiting for tensor to param

0987e23

20230913

3a631a6

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0a6d33f

… add_shard_layer_api

Complete the modifications according to Weihang's review

4755eb1

resolve conflict with develop

0483485

polish shard_layer api impl and doc

5f63cec

add shard layer test

9af38fa

resolve conflict with develop

0bd32b4

rewrite unittest

9d42bdd

revert needless change

33466ae

chenwhql mentioned this pull request Sep 21, 2023

Add paddle.distributed.shard_layer API cn doc PaddlePaddle/docs#6201

Merged

polish doc

36b09e0

luotao1 mentioned this pull request Sep 22, 2023

[CodeStyle] Ruff upgrade and addition of paddle.base check #57367

Closed

chenwhql added 4 commits September 22, 2023 03:56

add unittest for coverage

73958c0

add static branch and test

22ac971

polish en doc

4b871b7

polish test details

bcf194b

verify doc test demo

c911620

GhostScreaming reviewed Sep 25, 2023

View reviewed changes

LiYuRio reviewed Sep 25, 2023

View reviewed changes

sunzhongkai588 reviewed Sep 25, 2023

View reviewed changes

python/paddle/distributed/auto_parallel/api.py Outdated Show resolved Hide resolved

Update python/paddle/distributed/auto_parallel/api.py

00dd04e

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

GhostScreaming approved these changes Sep 25, 2023

View reviewed changes

LiYuRio approved these changes Sep 25, 2023

View reviewed changes

sunzhongkai588 approved these changes Sep 25, 2023

View reviewed changes

tianshuo78520a approved these changes Sep 25, 2023

View reviewed changes

zhiqiu approved these changes Sep 25, 2023

View reviewed changes

XiaoguangHu01 approved these changes Sep 26, 2023

View reviewed changes

chenwhql merged commit 6c0f338 into PaddlePaddle:develop Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoParallel] Add paddle.distributed.shard layer api #57604

[AutoParallel] Add paddle.distributed.shard layer api #57604

chenwhql commented Sep 21, 2023 •

edited

Loading

paddle-bot bot commented Sep 21, 2023

GhostScreaming Sep 25, 2023

chenwhql Sep 25, 2023

GhostScreaming Sep 25, 2023

LiYuRio Sep 25, 2023

chenwhql Sep 25, 2023

LiYuRio Sep 25, 2023

chenwhql Sep 25, 2023

sunzhongkai588 left a comment

GhostScreaming left a comment

LiYuRio left a comment

sunzhongkai588 left a comment

zhiqiu left a comment

XiaoguangHu01 left a comment

[AutoParallel] Add paddle.distributed.shard layer api #57604

[AutoParallel] Add paddle.distributed.shard layer api #57604

Conversation

chenwhql commented Sep 21, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Sep 21, 2023

GhostScreaming Sep 25, 2023

Choose a reason for hiding this comment

chenwhql Sep 25, 2023

Choose a reason for hiding this comment

GhostScreaming Sep 25, 2023

Choose a reason for hiding this comment

LiYuRio Sep 25, 2023

Choose a reason for hiding this comment

chenwhql Sep 25, 2023

Choose a reason for hiding this comment

LiYuRio Sep 25, 2023

Choose a reason for hiding this comment

chenwhql Sep 25, 2023

Choose a reason for hiding this comment

sunzhongkai588 left a comment

Choose a reason for hiding this comment

GhostScreaming left a comment

Choose a reason for hiding this comment

LiYuRio left a comment

Choose a reason for hiding this comment

sunzhongkai588 left a comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

chenwhql commented Sep 21, 2023 •

edited

Loading