-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon 5th No.40】为 Paddle 新增 ASGD API RFC #747
Conversation
WintersMontagne10335
commented
Nov 11, 2023
- 【PaddlePaddle Hackathon 5th】开源贡献个人挑战赛 Paddle#57262
## 1、相关背景 | ||
|
||
随机平均梯度下降(以下简称 `ASGD`)是 `SGD` 以空间换时间的策略版本,是一种轨迹平均的随机优化方法。 `ASGD` 在 `SGD` 的基础上,增加了历史参数的平均值度量,让下降方向噪音的方差呈递减趋势下降, | ||
从而使得算法最终会以线性速度收敛于最优值。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把论文可以贴在这里,并总结一下ASGD的核心步骤
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
mu.copy_(new_mu) | ||
``` | ||
|
||
这里与原论文中 `ASGD` 的实现不太一致,但是与 `SGD` 的实现很相近,与 `SGD` 最大的不同之处为,依据 `step` 更新 `eta`(实际上的学习率)。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASGD具体应该实现成什么样,缺少哪些,补充在这里吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
添加 python 上层接口: | ||
|
||
- `paddle.optimizer.ASGD` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该把重要的函数等也先设计好(类似于step等)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,我做完基础实现后补充哈。下周一(11.27)前会完成。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
基本版本比较简单,已经做完。
这里遇到一个问题:相比于优化版本来说,基础版本的参数是不全的。优化版本我还没理解透彻,所以现在没办法补全RFC。
最迟会在12.15日前补全。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 这里的 step 函数用的是父类 Optimizer 的,不需要重新设计
- 论文中的优化版本是针对一种特殊情况的实现的,不具有普遍性。其中,仅有一个方法可以用到这个 API 中。工作量比较小,三天内可以完成
- RFC 还有蛮多没补充完整的地方,我先把代码写完,后续会补充完整
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
今天做不完了。。静态图单测卡了很久,刚刚才解决。目前单测写了八成,还需要再排查一下是否有需要补充的地方。
大概明天能做完。
|
||
TODO | ||
|
||
# 六、测试和验收的考量 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
也需要构建边界场景
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
parameters=None, | ||
weight_decay=None, | ||
grad_clip=None, | ||
name=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add parameter of multi_precision
? to be same with code #58834
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done