Add auto completion module for auto parallel #34813

aoyulong · 2021-08-11T09:27:29Z

PR types

New features

PR changes

Others

Describe

Note that this pr won't be exposed to users and only involves the implementation of completing the distributed attributes. To see the public API for users, please refer to the other pr (add the basic apis for auto_parallel #33804)
complete_annotation(program, dist_context=None): the function uses a data-flow analysis algorithm to complete all distributed attributes for the input program. In the completion process, it also takes the distributed operators' implementations into consideration.
DistributedContext: this class can be seen as a container to store distributed information related to the program such as the distributed attributes. One partial annotated program may have multiple distributed context, each of which represents a different parallel strategy for the whole program. There always exists a default distributed context. The reason why we need this container is that the auto searcher will search different policies simultaneously and each policy should have its own container to store its corresponding distributed attributes before we select the best one.
TensorDistributedAttribute and OperatorDistributedAttribute: the two classes are wrapper for organizing the raw distributed attributes from VarDesc and OpDesc. Since the distributed attributes will be updated multiple times in the completion process, the two abstracts are really helpful instead of manipulating the raw ones. Furthermore, a tensor or an operator may have different distributed attributes at same time in the auto search process in the future.
Distributed Operators: the distributed operators is the underlying engine for auto parallel. Like operators and their kernels, each distributed operator will have multiple implementations according the parallel strategy. For example, the matmul operator has a corresponding distributed operator, including the row and column parallelization. The distributed operators will take care of the communication within its implementation and can be used to construct network directly for dynamic graph in the future. The ideal situation is that all operators have their own general distributed operators, which have implementations for all parallel strategy. For now, there are only few ones and each of them only has few implementations. Note that this pr doesn't include the implementations for added distributed operators (will be provided by @JZ-LIANG) and only the compatible rules are provided.

… auto_parallel_basic

…o_parallel

paddle-bot-old · 2021-08-11T09:27:40Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

sandyhouse · 2021-08-11T11:48:56Z

python/paddle/distributed/__init__.py

-      "spawn",
+
+__all__ = [  # noqa
+    "spawn",


why modify this file？

why modify this file？
Typo. It will be corrected.

sandyhouse · 2021-08-11T11:51:44Z

python/paddle/distributed/auto_parallel/attribute.py

+    def set_process_mesh(self, process_mesh):
+        self._process_mesh = process_mesh
+
+    def get_dims_mapping(self):


get_dim_mapping?

Lots of implementations rely on dims_mapping. Since this isn't exposed to user, I left it alone. Besides, I think we should try to convince @XiaoguangHu01 to adopt dims_mapping as tf.

sandyhouse · 2021-08-11T12:20:55Z

python/paddle/distributed/auto_parallel/attribute.py

+        self._owner_op = owner_op
+        if owner_context is None:
+            self._owner_context = owner_context
+        else:


Is the if else statement necessary？

Is the if else statement necessary？

No, it's unnecessary and will be removed.

Caozhou1995 · 2021-08-13T02:22:54Z

python/paddle/distributed/auto_parallel/attribute.py

+        result = cls.__new__(cls)
+        memo[id(self)] = result
+        for k, v in self.__dict__.items():
+            # No need to copy the owner tensor and context


copy the owner op？

Be corrected in the next commit.

Caozhou1995 · 2021-08-13T03:41:32Z

python/paddle/distributed/auto_parallel/context.py

+                    op_dist_attr.mark_as_annotated("process_mesh")
+                for tensor_name in op.input_arg_names:
+                    # There may be a better way to find the tensor by name
+                    tensor = op.block._var_recursive(tensor_name)


try op.block.vars[tensor_name]?

Try to make sure to find the tensor from current block level up to the top level (unnecessarily now).

fuyinno4 · 2021-08-16T06:43:55Z

python/paddle/distributed/auto_parallel/operators/embedding.py

@@ -0,0 +1,97 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.


"auto_parallel/operators/embedding.py" rename to "auto_parallel/operators/dist_embedding.py" ?

fuyinno4 · 2021-08-16T06:45:20Z

python/paddle/distributed/auto_parallel/operators/embedding.py

+
+
+# RowParallel
+class DistributedEmbeddingImpl0(DistributedOperatorImpl):


DistributedEmbeddingRowParallelImpl ?

JZ-LIANG · 2021-08-17T02:49:02Z

paddle/fluid/framework/var_desc.h

+  // Giving each variable an identity can help us map related properties to it.
+  // For example, the identity can be used as a key for referring to its
+  // distributed attribute.
+  uint64_t Id() { return id_; }


might name as dist_attr_id ? since by now it is used for determining the dist_attr identity only.

The dist_attr_id is obsoleted in the new code because it cannot work well in different distributed contexts.

I also have this question, maybe we can write in the comment that the id_ only be used for determining the dist_attr identity in auto_parallel now? avoid to make developers who read the code feel confused.

JZ-LIANG · 2021-08-17T03:02:25Z

python/paddle/distributed/auto_parallel/context.py

+from .utils import append_distributed_attr_suffix
+
+# There always exists a default context for user. And user can set it to another one.
+DEFAULT_DISTRIBUTED_CONTEXT = None


what is the reason that there always exists a default context ?
unlike the paddle.framework.program, normally we will has just one startup and one train program, so we have a default for each of them.
but in the scenario of autosearch, there is supposed to be multiple DistributedContext, what is the relationship between the default one and those created by autosearch.

Yes, the default distributed context should be removed. For now, it is just used in the __str__ of variable and operator for debugging because we don't pass a distributed context to these __str__ functions.

JZ-LIANG · 2021-08-17T03:09:21Z

python/paddle/distributed/auto_parallel/context.py

+            dims_mapping = attr.get_dims_mapping()
+            process_mesh_shape = attr.get_process_mesh().topology
+            # If the dimension of tensor is less than the sharding dimension of process mesh,
+            # we just amend the dimension mapping to -1. (Is this really OK?)


in this case it should raise an error directly?
or
we need define a un-even sharding rule for it allowing only part of process hold a real shard of the tensor, and part of them hold a null shard ?

Good point. This part should be implemented in a better way. But this is not an error, and it's a design choice.

JZ-LIANG · 2021-08-17T03:13:26Z

python/paddle/distributed/auto_parallel/operators/common.py

+    def __init__(self):
+        self._name = None
+
+    def forward(self, serial_op):


serial_op object as the only one input might not be sufficient?
at least need another input argument to pass context information (like the varname mapping in the program or graph)

JZ-LIANG · 2021-08-17T03:21:15Z

python/paddle/distributed/auto_parallel/operators/dist_reshape.py

+
+
+register_distributed_operator_impl("reshape2",
+                                   DistributedReshapeImpl0("add_one_dim_back"))


add_one_dim_back --> add_one_dim ?
the collapsed dimension might not be the last dimension in some cases.
will there be a implement for each case? (add_one_dim_second_to_last, add_one_dim_third_to_last, add_one_dim_front, .......)

The distributed reshape here only take cares of the situation where the added dimension is the last one and is just the reversed operation of another implementation. Besides, the naming problem of distributed operators and their implementations should be improved. Python doesn't have the preprocess function and cannot have automatic naming way easily.

chenwhql · 2021-08-18T03:11:54Z

python/paddle/distributed/auto_parallel/operators/dist_softmax.py

+register_distributed_operator("softmax", DistributedSoftmax("softmax"))
+
+
+class DistributedSoftmaxImpl(DistributedOperatorImpl):


just to confirm...we add the distributed op impl for each op here, are these ops only used for static mode? not for dynamic mode?

The static and dynamic mode of auto parallel should converge at the dist ops as the ops' role in the serial code since they are the lowest level. And we will try to keep them using the same interface. In the dynamic mode, user will use them to construct nn directly and we also can provide dist layer or model to give user more high level abstraction. The distributed ops interface and register mechanism may be improved and implemented in C++ when everything is stable.

chenwhql · 2021-08-18T03:12:06Z

python/paddle/distributed/auto_parallel/utils.py

@@ -0,0 +1,157 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.


2020 -> 2021

chenwhql · 2021-08-18T03:13:21Z

python/paddle/fluid/framework.py

@@ -1199,6 +1199,14 @@ def _to_readable_code(self):
        if self.persistable:
            var_str = "persist " + var_str

+        from paddle.distributed.auto_parallel.context import get_default_distributed_context


why not import at the beginning of file

Should be corrected. But this may be removed in the future.

chenwhql · 2021-08-18T03:13:40Z

python/paddle/fluid/framework.py

@@ -2359,6 +2367,13 @@ def _to_readable_code(self, skip_op_callstack=True):
            if i != len(attr_names) - 1:
                attrs_str += ", "

+        from paddle.distributed.auto_parallel.context import get_default_distributed_context


chenwhql · 2021-08-18T03:24:14Z

paddle/fluid/framework/var_desc.h

+  // Giving each variable an identity can help us map related properties to it.
+  // For example, the identity can be used as a key for referring to its
+  // distributed attribute.
+  uint64_t Id() { return id_; }


I also have this question, maybe we can write in the comment that the id_ only be used for determining the dist_attr identity in auto_parallel now? avoid to make developers who read the code feel confused.

* Fix bugs caused by shallow copy in attributes.py * Imporve amend_distributed_attr_for_program in context.py * Other changes for weihang's comments

…g/Paddle into auto_parallel_completion

chenwhql

LGTM for framework.py, *_desc.h

XiaoguangHu01 · 2021-08-23T10:03:46Z

python/paddle/distributed/auto_parallel/operators/dist_matmul.py

+
+
+# ColumnParallel
+class DistributedMatmulImpl0(DistributedOperatorImpl):


这个API会对外公开吗？为什么不用Column之类来标记呢？用Impl0, Impl1, Impl2比较难区分。

不会对外公开，对用户暴露的只有interface里面接口哈，这次pr都是内部实现。

XiaoguangHu01 · 2021-08-23T10:04:22Z

python/paddle/distributed/auto_parallel/operators/dist_matmul.py

+                                   DistributedMatmulImpl2("replicate_parallel"))
+
+
+class DistributedMatmulV2(DistributedOperator):


这个API会对外暴露吗？这里为什么需要加版本号呢？MatmulV2
v2是实现的版本号，不对用户暴露，未来有可能会升级到v3

同上，对用户暴露的只有interface.py里面接口哈

XiaoguangHu01 · 2021-08-23T10:07:28Z

python/paddle/distributed/auto_parallel/operators/dist_reshape.py

+        self._name = name
+
+
+register_distributed_operator("reshape2", DistributedReshape2("reshape2"))


这里的reshape2也是暴露了内部的实现。

同上，对用户暴露的只有interface.py里面接口哈~

XiaoguangHu01

LGTM

sandyhouse added 30 commits June 28, 2021 10:56

add auto_parallel dir

b985745

mv to paddle.distributed

b79e749

add shard_xx api

1671850

add distributed attrs for var

ec55a43

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

25abc00

… auto_parallel_basic

add ut, test=develop

bf24fb7

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

8ea9363

… auto_parallel_basic

add dist

9e4b3d8

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e65f77e

… auto_parallel_basic

update

8b95c1e

update

ccae6ae

update

d107751

update

f7e70ea

update

3111159

update, test=develop

70cdb69

update, test=develop

9e5b0f0

update, test=develop

59936ef

update, test=develop

27ee413

update, test=develop

3a8ceef

update, test=develop

d11f317

update, test=develop

f5ef245

update

7293b4f

update

1240edc

update

05455fb

update

3e1b3a0

update

8950c35

update, test=develop

b94a9f2

update, test=develop

e121349

update

fe51aa3

update

4563d42

sandyhouse and others added 7 commits August 10, 2021 18:01

update, test=develop

7ac6299

Add auto-completion module for auto-parallel (based on PR#33804)

c724593

Merge branch 'pr_33804' into auto_parallel

63e66bc

Merge branch 'PaddlePaddle:develop' into develop

7087b1e

Merge branch 'develop' of https://github.com/aoyulong/Paddle into aut…

1908acf

…o_parallel

Remove unnecessary files

86ccd47

Remove unrelated files for the auto completion pr

3f7dca2

Update the unit test to improve the coverage

ed02152

sandyhouse reviewed Aug 12, 2021

View reviewed changes

Caozhou1995 reviewed Aug 13, 2021

View reviewed changes

Shixiaowei02 previously approved these changes Aug 16, 2021

View reviewed changes

fuyinno4 reviewed Aug 16, 2021

View reviewed changes

Modify codes based on reviews

88e9e23

aoyulong dismissed Shixiaowei02’s stale review via 88e9e23 August 16, 2021 08:55

JZ-LIANG self-requested a review August 17, 2021 02:16

JZ-LIANG reviewed Aug 17, 2021

View reviewed changes

Minor changes for CI

63a6ec6

fuyinno4 previously approved these changes Aug 17, 2021

View reviewed changes

chenwhql reviewed Aug 18, 2021

View reviewed changes

aoyulong added 2 commits August 19, 2021 07:24

Improve some codes based on new comments

6b77bc8

* Fix bugs caused by shallow copy in attributes.py * Imporve amend_distributed_attr_for_program in context.py * Other changes for weihang's comments

Merge branch 'auto_parallel_completion' of https://github.com/aoyulon…

411507d

…g/Paddle into auto_parallel_completion

aoyulong dismissed fuyinno4’s stale review via 411507d August 19, 2021 07:32

chenwhql approved these changes Aug 20, 2021

View reviewed changes

XiaoguangHu01 reviewed Aug 23, 2021

View reviewed changes

raindrops2sea approved these changes Aug 23, 2021

View reviewed changes

XiaoguangHu01 approved these changes Aug 23, 2021

View reviewed changes

fuyinno4 merged commit 93d862b into PaddlePaddle:develop Aug 24, 2021

aoyulong deleted the auto_parallel_completion branch December 10, 2021 03:03

		@@ -0,0 +1,97 @@
		# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.



		# RowParallel
		class DistributedEmbeddingImpl0(DistributedOperatorImpl):



		register_distributed_operator_impl("reshape2",
		DistributedReshapeImpl0("add_one_dim_back"))

		register_distributed_operator("softmax", DistributedSoftmax("softmax"))


		class DistributedSoftmaxImpl(DistributedOperatorImpl):

		@@ -0,0 +1,157 @@
		# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.



		# ColumnParallel
		class DistributedMatmulImpl0(DistributedOperatorImpl):

		DistributedMatmulImpl2("replicate_parallel"))


		class DistributedMatmulV2(DistributedOperator):

		self._name = name


		register_distributed_operator("reshape2", DistributedReshape2("reshape2"))

Add auto completion module for auto parallel #34813

Add auto completion module for auto parallel #34813

Conversation

aoyulong commented Aug 11, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Aug 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aoyulong Aug 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aoyulong Aug 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenwhql left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

aoyulong commented Aug 11, 2021 •

edited

Loading

aoyulong Aug 16, 2021 •

edited

Loading

aoyulong Aug 17, 2021 •

edited

Loading