Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reshard] Support create shard tensor and non-zero dim reshard #56553

Merged
merged 3 commits into from
Aug 25, 2023

Conversation

LiYuRio
Copy link
Contributor

@LiYuRio LiYuRio commented Aug 23, 2023

PR types

New features

PR changes

Others

Description

Pcard-73145

补充Reshard模块的功能:

  • 在DistTensor构造函数中,增加reshard逻辑,支持创建shard状态的DistTensor
  • 新增Reshard算法的注册和选择机制
  • 增强shard到replilcate状态的静态检查,要求tensor的切分维度被对应组的进程数整除
  • 增强shard到replilcate状态的功能,支持切分输入的非0维,需要在all_gather后,实现split和concat
  • 调整reshard function的函数签名,减少拷贝开销

暴露tensor的_local_shape属性,可获取DistTensor里实际存储的物理DenseTensor的shape

@LiYuRio LiYuRio force-pushed the dev_shard_tensor branch 6 times, most recently from 1894497 to 227a9ea Compare August 23, 2023 03:40
@LiYuRio LiYuRio force-pushed the dev_shard_tensor branch 4 times, most recently from 0c8316e to 1ccfa44 Compare August 23, 2023 10:04
chenwhql
chenwhql previously approved these changes Aug 24, 2023
Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return out;
}

void ReshardFunction::set_dist_props(DistTensor* tensor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个命名可以用驼峰式就好,一般就是在当前类成员变量的set和get函数上会用小写的
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,下个pr改改

@LiYuRio LiYuRio merged commit 99795a1 into PaddlePaddle:develop Aug 25, 2023
@LiYuRio LiYuRio deleted the dev_shard_tensor branch August 25, 2023 08:07
BeingGod pushed a commit to BeingGod/Paddle that referenced this pull request Sep 9, 2023
…ePaddle#56553)

* support create shard dist tesnor

* support non-zero shard to replicated

* change reshard signature
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants