Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

新IR Python API适配升级(第三期) #62618

Closed
YuanRisheng opened this issue Mar 11, 2024 · 7 comments
Closed

新IR Python API适配升级(第三期) #62618

YuanRisheng opened this issue Mar 11, 2024 · 7 comments
Assignees
Labels
HappyOpenSource 快乐开源活动issue与PR PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc status/close 已关闭

Comments

@YuanRisheng
Copy link
Contributor

YuanRisheng commented Mar 11, 2024

一、BackGround 📚

任务背景、任务修改内容、提交样例可参考前期已发布过的任务:#58067

二、Task 📚

序号 Python API 所在文件 优先级 单测覆盖率 认领人 PR
🚧1 wait python/paddle/distributed/communication/group.py p1 🚧@zrr1999 #62974
🚧2 barrier python/paddle/distributed/communication/group.py p1 🚧@zrr1999 #62974
🙋3 all_gather python/paddle/distributed/communication/stream/all_gather.py p1 🙋@Eacient
✅4 all_reduce python/paddle/distributed/communication/stream/all_reduce.py p1 @SigureMo #62694
🙋5 alltoall python/paddle/distributed/communication/stream/all_to_all.py p1 🙋@ooooo-create
🙋6 broadcast python/paddle/distributed/communication/stream/broadcast.py p1 🙋@ooooo-create
🙋7 recv python/paddle/distributed/communication/stream/recv.py p1 🙋@ooooo-create
🙋8 reduce_scatter python/paddle/distributed/communication/stream/reduce_scatter.py p1 🙋@ooooo-create
🔵9 reduce python/paddle/distributed/communication/stream/reduce.py p1
🔵10 scatter python/paddle/distributed/communication/stream/scatter.py p1
🔵11 send python/paddle/distributed/communication/stream/send.py p1
🔵12 reshard python/paddle/distributed/auto_parallel/api.py p1
🔵13 split python/paddle/distributed/fleet/layers/mpu/mp_ops.py p1
🔵14 dropout python/paddle/distributed/fleet/layers/mpu/random.py p1
🙋15 RandomHorizontalFlip python/paddle/vision/transforms/transforms.py p1 🙋@jshh0401
🔵16 RandomVerticalFlip python/paddle/vision/transforms/transforms.py p1
🔵17 RandomErasing python/paddle/vision/transforms/transforms.py p1
🔵18 frame python/paddle/signal.py p1
🔵19 overlap_add python/paddle/signal.py p1
🔵20 fused_dot_product_attention python/paddle/incubate/nn/functional/fused_dot_product_attention.py p1
🔵21 flash_attn_unpadded python/paddle/nn/functional/flash_attention.py p1
🔵22 edit_distance python/paddle/nn/functional/loss.py p1
🔵23 l2_norm python/paddle/nn/utils/weight_norm_hook.py p1
🔵24 conv1d_transpose python/paddle/nn/functional/conv.py p1
🔵25 Adadelta python/paddle/optimizer/adadelta.py p1
🔵26 Dirichlet python/paddle/distribution/dirichlet.py p1
🙋27 LinearQuanter python/paddle/nn/quant/format.py p1 🙋@zrr1999
🙋28 LinearDeQuanter python/paddle/nn/quant/format.py p1 🙋@zrr1999
🙋29 FakeQuantAbsMax python/paddle/nn/quant/quant_layers.py p1 🙋@zrr1999
🙋30 FakeQuantMovingAverageAbsMax python/paddle/nn/quant/quant_layers.py p1 🙋@zrr1999
🙋31 FakeQuantChannelWiseAbsMax python/paddle/nn/quant/quant_layers.py p1 🙋@zrr1999
🙋32 MovingAverageAbsMaxScale python/paddle/nn/quant/quant_layers.py p1 🙋@zrr1999
✅33 weight_quantize python/paddle/nn/quant/quantized_linear.py p1 @zrr1999 #62988
#62988
✅34 weight_dequantize python/paddle/nn/quant/quantized_linear.py p1 @zrr1999 #62988
#62988
✅35 weight_only_linear python/paddle/nn/quant/quantized_linear.py p1 @zrr1999 #62988
#62988
✅36 apply_per_channel_scale python/paddle/nn/quant/quantized_linear.py p1 @zrr1999 #63472
#63472
🔵37 FakeQuanterWithAbsMaxObserver python/paddle/quantization/quanters/abs_max.py p1

任务统计

任务数量 🔵 可认领 🙋已认领 🚧 迁移中 🟢 待合入 ✅ 完成 🟡 下阶段推进 🏁完成率
37 18 12 2 0 5 0 13.5%

贡献者名单

排名不分先后 @SigureMo(1) @zrr1999(4)

分布式 API 适配指南

Tip

测试分布式 API 需要编译时开启 -DWITH_DISTRIBUTE=ON,并且需要至少两卡环境(单测需要两卡)

分布式 API 适配可参考 PR #62694,主要分为两部分:API 适配、单测验证。

API 适配

API 适配部分与前几期任务相同,如 #58067,即适配 API 中静态图分支,在 PIR 模式下分发到 PIR 下的 _C_ops 组网 API 上,如 #62694 中对于 python/paddle/distributed/communication/stream/all_reduce.py(适配 all_reduce)中的更改

值得注意的是,如果老 IR 静态图分支是 inplace 的操作,那么我们应该使用 inplace 的 op,如 c_allreduce_sum_,在老 IR 下是使用的 c_allreduce_sum,但输入输出为同一个,在 PIR 下应该直接用相应的 inplace op c_allreduce_sum_

单测验证

PIR 分布式 API 可以通过在 test/collective/process_group_nccl_pir.py 中添加新的 case 来验证,整体可参考相应的动态图单测 test/collective/process_group_nccl.py,新增单测 case 顺序最好和动态图保持一致。如果动态图单测中没有相应的 API,需要根据文档确定该 API 语义,并添加相应的 case。

添加后可通过运行

ctest -R test_collective_process_group_pir

来验证适配是否成功

调试技巧

你可以通过将子进程 stdout、stderr 重定向到文件中以便调试,如修改 test/legacy_test/test_parallel_dygraph_dataparallel.py 如下:

     procs = []
-    for t in pod.trainers:
+    for i, t in enumerate(pod.trainers):
         ...
-        proc = subprocess.Popen(cmd.split(" "), env=current_env)
+        proc = subprocess.Popen(cmd.split(" "), env=current_env, stdout=open(f"/tmp/out_{i}.log", "wb"), stderr=open(f"/tmp/err_{i}.log", "wb"))

之后就可以在 /tmp/out_0.log/tmp/out_1.log/tmp/err_0.log/tmp/err_1.log 中看到各个子进程详细的输出和报错信息了

量化 API 适配指南

TODO

@YuanRisheng YuanRisheng added the PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc label Mar 11, 2024
@SigureMo SigureMo assigned zrr1999 and unassigned JZ-LIANG Mar 12, 2024
@zrr1999
Copy link
Member

zrr1999 commented Mar 12, 2024

【报名】:1、33-35

@SigureMo
Copy link
Member

【报名】:4

@luotao1 luotao1 moved this to In Progress in Call for Contributions Mar 15, 2024
@luotao1 luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Mar 15, 2024
@luotao1 luotao1 self-assigned this Mar 15, 2024
@Eacient
Copy link

Eacient commented Mar 18, 2024

【报名】:3

@jshh0401
Copy link

【报名】:15

@zrr1999
Copy link
Member

zrr1999 commented Mar 27, 2024

【报名】:27、28、36

@zrr1999
Copy link
Member

zrr1999 commented Mar 31, 2024

【报名】:29-32

@ooooo-create
Copy link
Contributor

【报名】:5-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HappyOpenSource 快乐开源活动issue与PR PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc status/close 已关闭
Projects
Development

No branches or pull requests

9 participants