[Auto Parallel]: Support std::vector<phi::Tensor> input and output for DistTensor. #56602

GhostScreaming · 2023-08-23T16:59:48Z

PR types

Others

PR changes

Others

Description

Pcard-73145

Support std::vector<phi::Tensor> input and output for DistTensor. Meanwhile no_need_buffer type is also supported. Concat, Broadcast_Tensors, Unbind forward and backward are verified. Following operators need to be supported later (along with their backward op if exists): check_finite_and_unscale, coalesce_tensor, meshgrid, update_loss_scaling, einsum.

Foward operators have output of std::tuple<...> are not supported now.

Concat forward and backward are verified.

paddle/fluid/eager/tensor_wrapper.h

… dist_tensor_support

…Tensor> -> std::vector<Tensor>.

… dist_tensor_support

…or output of operators. Following testcases are passed. 1. concat: std::vector<phi::Tensor> -> phi::Tensor 2. unbind: phi::Tensor -> std::vector<phi::Tensor> 3. broadcast_tensors: std::vector<phi::Tensor> -> std::vector<phi::Tensor>

paddle/phi/api/yaml/generator/dist_api_gen.py

chenwhql · 2023-09-04T07:44:08Z

paddle/fluid/eager/tensor_wrapper.h

+        phi::distributed::DistTensor* dist_tensor =
+            static_cast<phi::distributed::DistTensor*>(tensor.impl().get());
+        intermidiate_tensor_.set_impl(
+            std::make_shared<phi::distributed::DistTensor>(


这里新构造tensor会触发重切分，我们还需要讨论一下，这样能过的话，也可以先这样写

目前在concat上验证，是没有问题的。可以先加一个TODO？

没关系，后面我来处理下

paddle/phi/api/yaml/generator/dist_api_gen.py

paddle/phi/api/yaml/generator/dist_bw_api_gen.py

LiYuRio

小问题可以下个pr改

LiYuRio · 2023-09-05T02:50:24Z

paddle/phi/api/lib/data_transform.cc

+                         const TransformFlag& transform_flag,
+                         bool is_stride_kernel) {
+  std::vector<std::shared_ptr<phi::distributed::DistTensor>> out;
+  for (auto x : input) {


这里可以改成auto&，减少拷贝

LiYuRio · 2023-09-05T02:55:41Z

paddle/phi/api/lib/data_transform.cc

+                                     dense_tensor.meta().is_contiguous()))) {
+        out.push_back(
+            std::static_pointer_cast<phi::distributed::DistTensor>(tensor_in));
+        continue;


理论上这里可以写到else分支里，不用continue

…r DistTensor. (PaddlePaddle#56602) * [WIP] Support std::vector<phi::Tensor> input and output for DistTensor. Concat forward and backward are verified. * Polish code for new dist tensor implementation. * Fix bug of DistTensor upgrade. Add support functions for std::vector<Tensor> -> std::vector<Tensor>. * Add support for DistTensor type of std::vector<phi::Tensor> as input or output of operators. Following testcases are passed. 1. concat: std::vector<phi::Tensor> -> phi::Tensor 2. unbind: phi::Tensor -> std::vector<phi::Tensor> 3. broadcast_tensors: std::vector<phi::Tensor> -> std::vector<phi::Tensor> * Polish code. Remove useless comments. * Add update_loss_scaling in skip_op_lists. * Polish code.

[WIP] Support std::vector<phi::Tensor> input and output for DistTensor.

4edd727

Concat forward and backward are verified.

chenwhql reviewed Aug 24, 2023

View reviewed changes

paddle/fluid/eager/tensor_wrapper.h Outdated Show resolved Hide resolved

GhostScreaming added 6 commits August 24, 2023 15:56

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e8f28a8

… dist_tensor_support

Polish code for new dist tensor implementation.

fed53d3

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

651c205

… dist_tensor_support

Fix bug of DistTensor upgrade. Add support functions for std::vector<…

d4a1653

…Tensor> -> std::vector<Tensor>.

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

60b0d50

… dist_tensor_support

GhostScreaming changed the title ~~[WIP] Support std::vector<phi::Tensor> input and output for DistTensor.~~ Support std::vector<phi::Tensor> input and output for DistTensor. Sep 4, 2023

GhostScreaming changed the title ~~Support std::vector<phi::Tensor> input and output for DistTensor.~~ [Auto Parallel]: Support std::vector<phi::Tensor> input and output for DistTensor. Sep 4, 2023

GhostScreaming added 2 commits September 4, 2023 11:43

Polish code. Remove useless comments.

cd716d7

Add update_loss_scaling in skip_op_lists.

14b7ebe

chenwhql reviewed Sep 4, 2023

View reviewed changes

Polish code.

26c149e

LiYuRio approved these changes Sep 5, 2023

View reviewed changes

chenwhql approved these changes Sep 5, 2023

View reviewed changes

chenwhql merged commit d2fedea into PaddlePaddle:develop Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Auto Parallel]: Support std::vector<phi::Tensor> input and output for DistTensor. #56602

[Auto Parallel]: Support std::vector<phi::Tensor> input and output for DistTensor. #56602

GhostScreaming commented Aug 23, 2023 •

edited

Loading

chenwhql Sep 4, 2023

GhostScreaming Sep 4, 2023

chenwhql Sep 4, 2023

LiYuRio left a comment

LiYuRio Sep 5, 2023

LiYuRio Sep 5, 2023

[Auto Parallel]: Support std::vector<phi::Tensor> input and output for DistTensor. #56602

[Auto Parallel]: Support std::vector<phi::Tensor> input and output for DistTensor. #56602

Conversation

GhostScreaming commented Aug 23, 2023 • edited Loading

PR types

PR changes

Description

chenwhql Sep 4, 2023

Choose a reason for hiding this comment

GhostScreaming Sep 4, 2023

Choose a reason for hiding this comment

chenwhql Sep 4, 2023

Choose a reason for hiding this comment

LiYuRio left a comment

Choose a reason for hiding this comment

LiYuRio Sep 5, 2023

Choose a reason for hiding this comment

LiYuRio Sep 5, 2023

Choose a reason for hiding this comment

GhostScreaming commented Aug 23, 2023 •

edited

Loading