Add the DistributedFusedLamb optimizer #39148

sneaxiy · 2022-01-22T16:15:00Z

PR types

New features

PR changes

OPs

Describe

Add hybrid parallel DistributedFusedLamb optimizer.

At the beginning, all workers have the whole parameters, the local gradients of the whole parameters, the partial moments.
The local gradients of the whole parameter are reduce-scattered. Each worker have the partial reduce-scattered gradients.
Each worker calculates the partial trust ratio div tensor using the partial reduce-scattered gradients.
If the global norm clip is needed, each worker calculates the local gradient square L2-norm value using the partial reduce-scattered gradients and then calls ncclAllReduce to get the global gradient square L2-norm.
Each worker calculates the square L2-norm value of the whole parameter.
Each worker calculates the local square L2-norm value of the partial trust ratio div tensor, and then calls ncclAllReduce to get the global square L2-norm value.
Each worker updates partial parameter, and call the ncclAllGather to get the whole updated parameter.

paddle-bot-old · 2022-01-22T16:15:19Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot-old · 2022-02-02T02:35:10Z

Sorry to inform you that b9a7f57's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

limin2021

LGTM.

Aurelius84

LGTM for dtype registerar

zhiqiu

LGTM

zhiqiu · 2022-02-18T03:11:19Z

python/paddle/fluid/contrib/mixed_precision/decorator.py

+            self._optimizer._set_scale(self._loss_scaling)
+            optimize_ops = self._optimizer.apply_gradients(params_grads)
+            found_inf = self._optimizer._found_inf
+            self._add_dynamic_loss_scaling(params_grads, found_inf)


Not important, but it seems params_grads is not used in _add_dynamic_loss_scaling in this case.

Yes. But not quite important.

XiaoguangHu01

LGTM

TCChenlong

LGTM

XieYunshen

LGTM for set_tests_properties(test_distributed_fused_lamb_op_with_clip PROPERTIES TIMEOUT 120) set_tests_properties(test_distributed_fused_lamb_op_without_clip PROPERTIES TIMEOUT 120)

add DistributedFusedLamb op

fdb70be

sneaxiy added 2 commits January 22, 2022 16:17

polish code

4751b5a

Merge upstream/develop

a07c50b

sneaxiy force-pushed the add_dist_fused_lamb branch from 27d44d9 to 112c98e Compare January 23, 2022 15:20

sneaxiy force-pushed the add_dist_fused_lamb branch from 112c98e to f422178 Compare January 24, 2022 04:17

fix compile error

3cc786a

sneaxiy force-pushed the add_dist_fused_lamb branch from f422178 to 3cc786a Compare January 24, 2022 05:57

sneaxiy added 5 commits January 24, 2022 07:31

Merge upstream/develop

8e48865

compatible with pten changement

dc5e176

Merge develop to solve conflict

1f78e2b

fix rocm compile error

ffcac8e

improve converage

b9a7f57

sneaxiy requested review from limin2021, guoshengCS and zhiqiu January 25, 2022 05:17

sneaxiy and others added 5 commits February 5, 2022 12:54

Merge upstream/develop

5a2b664

Merge upstream/develop

cbc7671

update upstream/develop

71b91d4

Merge upstream/develop to solve conflict

f693843

Merge branch 'develop' into add_dist_fused_lamb

9bab15c

sneaxiy changed the title ~~Add DistributedFusedLamb optimizer~~ Add the DistributedFusedLamb optimizer Feb 10, 2022

sneaxiy added 4 commits February 11, 2022 07:40

Merge upstream/develop

55a1a4e

fix cast_with_ptr.h

ffd14f4

add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

6528c34

fix clip before allreduce

5c82d3b

sneaxiy added 4 commits February 14, 2022 07:28

Merge upstream/develop

52f1b9f

add use_master_param_norm

93859d8

code polish

06944e8

Merge upstream/develop

1806633

fix bug

d5f3392

sneaxiy force-pushed the add_dist_fused_lamb branch from 9e85146 to d5f3392 Compare February 15, 2022 18:11

fix ROCM ci

dc8c2eb

limin2021 approved these changes Feb 17, 2022

View reviewed changes

sneaxiy requested review from lanxianghit, TCChenlong, Aurelius84, XiaoguangHu01, zhhsplendid and XieYunshen and removed request for lanxianghit February 17, 2022 14:59

Aurelius84 approved these changes Feb 18, 2022

View reviewed changes

zhiqiu approved these changes Feb 18, 2022

View reviewed changes

XiaoguangHu01 approved these changes Feb 18, 2022

View reviewed changes

TCChenlong approved these changes Feb 18, 2022

View reviewed changes

XieYunshen approved these changes Feb 18, 2022

View reviewed changes

sneaxiy merged commit 5df3cd6 into PaddlePaddle:develop Feb 18, 2022

sneaxiy deleted the add_dist_fused_lamb branch February 18, 2022 23:50

sneaxiy mentioned this pull request Feb 28, 2022

Optimize the CUDA kernel in DistributedFusedLamb optimizer #39972

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the DistributedFusedLamb optimizer #39148

Add the DistributedFusedLamb optimizer #39148

sneaxiy commented Jan 22, 2022 •

edited

Loading

paddle-bot-old bot commented Jan 22, 2022

paddle-bot-old bot commented Feb 2, 2022

limin2021 left a comment

Aurelius84 left a comment

zhiqiu left a comment

zhiqiu Feb 18, 2022

sneaxiy Feb 18, 2022

XiaoguangHu01 left a comment

TCChenlong left a comment

XieYunshen left a comment

Add the DistributedFusedLamb optimizer #39148

Add the DistributedFusedLamb optimizer #39148

Conversation

sneaxiy commented Jan 22, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Jan 22, 2022

paddle-bot-old bot commented Feb 2, 2022

limin2021 left a comment

Choose a reason for hiding this comment

Aurelius84 left a comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

zhiqiu Feb 18, 2022

Choose a reason for hiding this comment

sneaxiy Feb 18, 2022

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

sneaxiy commented Jan 22, 2022 •

edited

Loading