From f6f95981f71ad81b6549adf742b1632c0abe973f Mon Sep 17 00:00:00 2001
From: Robin <robin.merz@icloud.com>
Date: Tue, 19 Jan 2021 13:34:33 +0100
Subject: [PATCH] Added chapter about Nesterov momentum optimization

---
 .vscode/settings.json |  3 ---
 docs/optimizers.rst   | 10 +++++++++-
 2 files changed, 9 insertions(+), 4 deletions(-)
 delete mode 100644 .vscode/settings.json

diff --git a/.vscode/settings.json b/.vscode/settings.json
deleted file mode 100644
index 12ff2fd..0000000
--- a/.vscode/settings.json
+++ /dev/null
@@ -1,3 +0,0 @@
-{
-    "restructuredtext.confPath": "${workspaceFolder}/docs"
-}
\ No newline at end of file
diff --git a/docs/optimizers.rst b/docs/optimizers.rst
index 07e9919..06d92bd 100644
--- a/docs/optimizers.rst
+++ b/docs/optimizers.rst
@@ -139,7 +139,15 @@ of the gradient on previous steps. This results in minimizing oscillations and f
 Nesterov Momentum
 -----------------
 
-Be the first to `contribute! <https://github.com/bfortuner/ml-cheatsheet>`__
+Nesterov momentum optimization is a minor but effective variation of the regular momentum optimization proposed by Yuri Nesterov. 
+The key concept is that the gradient of the cost function is measured ahead of the local position in the direction 
+of the momentum (at point :math:`W + \beta v_{dW}`). This Works since the momentum vector will be pointing in the correct direction and
+is generally faster than regular momentum optimization.
+
+.. math::
+
+    v_{dW} = \beta v_{dW} + (1 - \beta) \frac{\partial \mathcal{J} }{ \partial (W + \beta v_{dW}) } \\
+    W = W - \alpha v_{dW}
 
 
 Newton's Method