-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
新IR Python API适配升级(第三期) #62618
Labels
HappyOpenSource
快乐开源活动issue与PR
PFCC
Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc
status/close
已关闭
Comments
YuanRisheng
added
the
PFCC
Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc
label
Mar 11, 2024
【报名】:1、33-35 |
【报名】:4 |
This was referenced Mar 13, 2024
Closed
【报名】:3 |
【报名】:15 |
28 tasks
【报名】:27、28、36 |
【报名】:29-32 |
28 tasks
【报名】:5-8 |
30 tasks
github-project-automation
bot
moved this from In Progress
to Done
in Call for Contributions
Jul 23, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
HappyOpenSource
快乐开源活动issue与PR
PFCC
Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc
status/close
已关闭
一、BackGround 📚
任务背景、任务修改内容、提交样例可参考前期已发布过的任务:#58067
二、Task 📚
#62988
#62988
#62988
#63472
任务统计
贡献者名单
分布式 API 适配指南
Tip
测试分布式 API 需要编译时开启
-DWITH_DISTRIBUTE=ON
,并且需要至少两卡环境(单测需要两卡)分布式 API 适配可参考 PR #62694,主要分为两部分:API 适配、单测验证。
API 适配
API 适配部分与前几期任务相同,如 #58067,即适配 API 中静态图分支,在 PIR 模式下分发到 PIR 下的
_C_ops
组网 API 上,如 #62694 中对于python/paddle/distributed/communication/stream/all_reduce.py
(适配all_reduce
)中的更改值得注意的是,如果老 IR 静态图分支是 inplace 的操作,那么我们应该使用 inplace 的 op,如
c_allreduce_sum_
,在老 IR 下是使用的c_allreduce_sum
,但输入输出为同一个,在 PIR 下应该直接用相应的 inplace opc_allreduce_sum_
。单测验证
PIR 分布式 API 可以通过在
test/collective/process_group_nccl_pir.py
中添加新的 case 来验证,整体可参考相应的动态图单测test/collective/process_group_nccl.py
,新增单测 case 顺序最好和动态图保持一致。如果动态图单测中没有相应的 API,需要根据文档确定该 API 语义,并添加相应的 case。添加后可通过运行
来验证适配是否成功
调试技巧
你可以通过将子进程 stdout、stderr 重定向到文件中以便调试,如修改
test/legacy_test/test_parallel_dygraph_dataparallel.py
如下:之后就可以在
/tmp/out_0.log
、/tmp/out_1.log
、/tmp/err_0.log
、/tmp/err_1.log
中看到各个子进程详细的输出和报错信息了量化 API 适配指南
TODO
The text was updated successfully, but these errors were encountered: