Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoParallel] Add paddle.distributed.shard layer api #57604

Merged
merged 29 commits into from
Sep 26, 2023

Conversation

chenwhql
Copy link
Contributor

@chenwhql chenwhql commented Sep 21, 2023

PR types

New features

PR changes

APIs

Description

Pcard-73145

[AutoParallel] Add paddle.distributed.shard layer api

def shard_layer(
    layer: nn.Layer,
    process_mesh: dist.ProcessMesh,
    shard_fn: Callable = None,
    input_fn: Callable = None,
    output_fn: Callable = None,
) -> nn.Layer:
    """
    Converts all layer's parameters to DistTensor parameters according to
    the `shard_fn` specified. It could also control the conversion of input
    or output of the layer by specifying the `input_fn` and `output_fn`.
    (i.e. convert the input to `paddle.Tensor` with DistTensor, convert output
    back to `paddle.Tensor` with DenseTensor.)

    The `shard_fn` should have the following signature:

        def shard_fn(layer_name, layer, process_mesh) -> None

    The `input_fn` should have the following signature:

        def input_fn(inputs, process_mesh) -> list(paddle.Tensor)

    In general, the type of `input_fn` return value is paddle.Tensor with DistTensor.

    The `output_fn` should have the following signature:

        def output_fn(outputs, process_mesh) -> list(paddle.Tensor)

    In general, the type of `output_fn` return value is paddle.Tensor with DenseTensor.

    Args:
        layer (paddle.nn.Layer): The Layer object to be shard.
        process_mesh (paddle.distributed.ProcessMesh): The `ProcessMesh` information
            to be place the input `layer`.
        shard_fn (Callable): The function to shard layer parameters across
            the `process_mesh`. If not specified, by default we replicate
            all parameters of the layer across the `process_mesh`.
        input_fn (Callable): Specify how the input of the layer is sharded.
            The `input_fn` will be registered for the Layer as a `forward pre-hook`.
            By default we do not shard the input.
        output_fn (Callable): Specify how the output of the layer is sharded or
            convert it back to `paddle.Tensor` with DenseTensor.
            The `output_fn` will be registered for the Layer as `forward post-hook`.
            By default we do not shard or convert the output.

    Returns:
        Layer: A layer that contains parameters/buffers
            that are all `paddle.Tensor` with DistTensor

The cn doc: PaddlePaddle/docs#6201

@paddle-bot
Copy link

paddle-bot bot commented Sep 21, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.


def shard_layer(
layer: nn.Layer,
process_mesh: dist.ProcessMesh,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shard_layer需要支持跨mesh的情况吗?不支持的话,replicate_layer_params_and_buffers需不需要检查TensorProcessMesh是否合法

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

理论上不应该限制是否跨mesh,不过当用户跨mesh shard_layer的时候,就不需要调用reshard了,这块在用户使用行为上还需要讨论下

不过,这里不需要检查mesh具体的状态,因为shard_layer是shard_tensor的包装,要检查的话在shard_tensor中检查更加合适,shard_tensor检查不合法就报错

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

明白了~

layer: nn.Layer, mesh: dist.ProcessMesh
) -> None:
for key, param in layer._parameters.items():
if param is not None and not param.is_dist():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果参数不是dist_tensor,就把它变成replicated的,如果我们后面再执行机制里加了转replicated的操作,这里还需要吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面API都是inplace转replicated的话,这里应该可以删除,效果一致。

不过也不冲突,前置转其实逻辑上更顺畅一些,inplace转多少还是隐式更改了用户的输入


class TestShardLayer(unittest.TestCase):
def setUp(self):
self.mesh = dist.ProcessMesh([0, 1], dim_names=["x"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个单测,没有用launch启,但是可以用两卡吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里没实际用mesh切分,下面的spec都是None,运行时tensor都是replicated,切分会跳过,所以单卡也能跑,主要是测了一遍流程

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. code-block:: python 少了个空格

python/paddle/distributed/auto_parallel/api.py Outdated Show resolved Hide resolved
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Copy link
Contributor

@GhostScreaming GhostScreaming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@LiYuRio LiYuRio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for docs

Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenwhql chenwhql merged commit 6c0f338 into PaddlePaddle:develop Sep 26, 2023
Frida-a pushed a commit to Frida-a/Paddle that referenced this pull request Oct 14, 2023
…7604)

* def dtensor_from_fn first edition

* dtensor_from_fn first edition

* shard_layer api and utest(temporarily unavailable)

* shard_layer API and unit test preliminary complete

* complete the sample code modification according to ZhongKai's suggestion

* modify according to the review

* modify according to LiangGe's review

* Not approved yet, temporarily stored

* waiting for tensor to param

* Complete the modifications according to Weihang's review

* polish shard_layer api impl and doc

* add shard layer test

* rewrite unittest

* revert needless change

* polish doc

* add unittest for coverage

* add static branch and test

* polish en doc

* polish test details

* verify doc test demo

* Update python/paddle/distributed/auto_parallel/api.py

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

---------

Co-authored-by: yangxiaoyu14 <yangxiaoyu14@baidu.com>
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
jiahy0825 pushed a commit to jiahy0825/Paddle that referenced this pull request Oct 16, 2023
…7604)

* def dtensor_from_fn first edition

* dtensor_from_fn first edition

* shard_layer api and utest(temporarily unavailable)

* shard_layer API and unit test preliminary complete

* complete the sample code modification according to ZhongKai's suggestion

* modify according to the review

* modify according to LiangGe's review

* Not approved yet, temporarily stored

* waiting for tensor to param

* Complete the modifications according to Weihang's review

* polish shard_layer api impl and doc

* add shard layer test

* rewrite unittest

* revert needless change

* polish doc

* add unittest for coverage

* add static branch and test

* polish en doc

* polish test details

* verify doc test demo

* Update python/paddle/distributed/auto_parallel/api.py

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

---------

Co-authored-by: yangxiaoyu14 <yangxiaoyu14@baidu.com>
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023
…7604)

* def dtensor_from_fn first edition

* dtensor_from_fn first edition

* shard_layer api and utest(temporarily unavailable)

* shard_layer API and unit test preliminary complete

* complete the sample code modification according to ZhongKai's suggestion

* modify according to the review

* modify according to LiangGe's review

* Not approved yet, temporarily stored

* waiting for tensor to param

* Complete the modifications according to Weihang's review

* polish shard_layer api impl and doc

* add shard layer test

* rewrite unittest

* revert needless change

* polish doc

* add unittest for coverage

* add static branch and test

* polish en doc

* polish test details

* verify doc test demo

* Update python/paddle/distributed/auto_parallel/api.py

Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>

---------

Co-authored-by: yangxiaoyu14 <yangxiaoyu14@baidu.com>
Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants