From f6f95981f71ad81b6549adf742b1632c0abe973f Mon Sep 17 00:00:00 2001 From: Robin Date: Tue, 19 Jan 2021 13:34:33 +0100 Subject: [PATCH] Added chapter about Nesterov momentum optimization --- .vscode/settings.json | 3 --- docs/optimizers.rst | 10 +++++++++- 2 files changed, 9 insertions(+), 4 deletions(-) delete mode 100644 .vscode/settings.json diff --git a/.vscode/settings.json b/.vscode/settings.json deleted file mode 100644 index 12ff2fd..0000000 --- a/.vscode/settings.json +++ /dev/null @@ -1,3 +0,0 @@ -{ - "restructuredtext.confPath": "${workspaceFolder}/docs" -} \ No newline at end of file diff --git a/docs/optimizers.rst b/docs/optimizers.rst index 07e9919..06d92bd 100644 --- a/docs/optimizers.rst +++ b/docs/optimizers.rst @@ -139,7 +139,15 @@ of the gradient on previous steps. This results in minimizing oscillations and f Nesterov Momentum ----------------- -Be the first to `contribute! `__ +Nesterov momentum optimization is a minor but effective variation of the regular momentum optimization proposed by Yuri Nesterov. +The key concept is that the gradient of the cost function is measured ahead of the local position in the direction +of the momentum (at point :math:`W + \beta v_{dW}`). This Works since the momentum vector will be pointing in the correct direction and +is generally faster than regular momentum optimization. + +.. math:: + + v_{dW} = \beta v_{dW} + (1 - \beta) \frac{\partial \mathcal{J} }{ \partial (W + \beta v_{dW}) } \\ + W = W - \alpha v_{dW} Newton's Method