Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 5th No.22】Add CosineAnnealingWarmRestarts API to Paddle #6286

Merged
merged 2 commits into from
Nov 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions docs/api/paddle/optimizer/lr/CosineAnnealingWarmRestarts_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
.. _cn_api_paddle_optimizer_lr_CosineAnnealingWarmRestarts:

CosineAnnealingWarmRestarts
-----------------------------------

.. py:class:: paddle.optimizer.lr.CosineAnnealingWarmRestarts(learning_rate, T_0, T_mult=1, eta_min=0, last_epoch=-1, verbose=False)

该接口使用 ``cosine annealing`` 的策略来动态调整学习率。

.. math::
\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 +
\cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right)

:math:`\eta_{max}` 的初始值为 ``learning_rate``, :math:`T_{cur}` 是 SGDR(重启训练 SGD)训练过程中的当前训练轮数。:math:`T_{i}` 是 SGDR 两次重启训练之间 epoch 的数量

当 :math:`T_{cur}=T_{i}` ,设 :math:`\eta_t = \eta_{min}` 。当重启后 :math:`T_{cur}=0` ,设 :math:`\eta_t=\eta_{max}` 。

SGDR 的训练方法可以参考论文,相关论文:`SGDR: Stochastic Gradient Descent with Warm Restarts <https://arxiv.org/abs/1608.03983>`_

参数
::::::::::::

- **learning_rate** (float) - 初始学习率。
- **T_0** (int) - 首次重启后迭代的次数。
- **T_mult** (int,可选) - 重启之后 :math:`T_{i}` 乘积增长因子。默认值 1。
- **eta_min** (float,可选) - 最小学习率。默认值 0.
- **last_epoch** (int,可选) - 上一轮的轮数,重启训练时设置为上一轮的 epoch 数。默认值为 -1,则为初始学习率。
- **verbose** (bool,可选) - 如果是 ``True``,则在每一轮更新时在标准输出 `stdout` 输出一条信息。默认值为 ``False`` 。

返回
::::::::::::
用于调整学习率的 ``CosineAnnealingWarmRestarts`` 实例对象。

代码示例
::::::::::::

COPY-FROM: paddle.optimizer.lr.CosineAnnealingWarmRestarts:code-example1
COPY-FROM: paddle.optimizer.lr.CosineAnnealingWarmRestarts:code-example2

方法
::::::::::::
step(epoch=None)
'''''''''

step 函数需要在优化器的 `optimizer.step()` 函数之后调用,调用之后将会根据 epoch 数来更新学习率,更新之后的学习率将会在优化器下一轮更新参数时使用。

**参数**

- **epoch** (int,可选) - 指定具体的 epoch 数。默认值 None,此时将会从-1 自动累加 ``epoch`` 数。

**返回**

无。

**代码示例**

参照上述示例代码。
2 changes: 2 additions & 0 deletions docs/api/paddle/optimizer/lr/LRScheduler_cn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ LRScheduler

* :code:`LinearLR`: 学习率随 step 数线性增加到指定学习率。 请参考 :ref:`cn_api_paddle_optimizer_lr_LinearLR`。

* :code:`CosineAnnealingWarmRestarts`: 余弦退火学习率,即学习率随 step 数变化呈余弦函数周期变化。 请参考 :ref:`cn_api_paddle_optimizer_lr_CosineAnnealingWarmRestarts`。

你可以继承该基类实现任意的学习率策略,导出基类的方法为 ``from paddle.optimizer.lr import LRScheduler`` ,
必须要重写该基类的 ``get_lr()`` 函数,否则会抛出 ``NotImplementedError`` 异常。

Expand Down
3 changes: 3 additions & 0 deletions docs/api_guides/low_level/layers/learning_rate_scheduler.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,6 @@

* :code:`LinearLR`: 学习率随 step 数线性增加到指定学习率。
相关 API Reference 请参考 :ref:`_cn_api_paddle_optimizer_lr_LinearLR`

* :code:`CosineAnnealingWarmRestarts`: 余弦退火学习率,即学习率随 step 数变化呈余弦函数周期变化。
相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_CosineAnnealingWarmRestarts`
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,5 @@ The following content describes the APIs related to the learning rate scheduler:
* :code:`CyclicLR`: Cyclic decay. That is, the learning rate cycles between minimum and maximum learning rate with a constant frequency in specified a scale method. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_CyclicLR`

* :code:`LinearLR`: Linear decay. That is, the learning rate will be firstly multiplied by start_factor and linearly increase to end learning rate. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_LinearLR`

* :code:`CosineAnnealingWarmRestarts`: Cosine attenuation. It means the learning rate changes with the number of steps in the form of a cosine function. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_CosineAnnealingWarmRestarts`