Skip to content

Latest commit

 

History

History
92 lines (42 loc) · 2.36 KB

mnn.layer.SoftmaxLayer.md

File metadata and controls

92 lines (42 loc) · 2.36 KB

class SoftmaxLayer

method __init__

__init__(*shape, axis=1)

method backward

backward(gradients)

When $i = k$,

$$ \begin{aligned} \partial y_i / \partial x_k =& \frac{\exp(x_i)[\sum_j \exp(x_j)] - \exp(x_i)\exp(x_i)}{[\sum_j \exp(x_j)]^2} \\ =& \frac{\exp(x_i) \left([\sum_j \exp(x_j)] -\exp(x_i)\right)}{[\sum_j \exp(x_j)]^2} \\ =& \frac{\exp(x_i)}{\sum_j \exp(x_j)} \cdot \frac{[\sum_j \exp(x_j)] - \exp(x_i)}{\sum_j \exp(x_j)} \\ =& y_i \cdot (1 - y_i) \\ =& y_i - y_i y_k \end{aligned} $$

When $i \not= k$,

$$ \begin{aligned} \partial y_i / \partial x_k =& \frac{0 - \exp(x_i)\exp(x_k)}{[\sum_j \exp(x_j)]^2} \\ =& - y_i y_k \end{aligned} $$

As a result, the Jacobian matrix

$$ J_x y = \begin{bmatrix} y_1 - y_1 y_1 & - y_1 y_2 & ... & - y_1 y_n \\ y_2 y_1 & y_2 - y_1 y_2 & ... & - y_1 y_n \\ \vdots & \ddots \\ y_n y_1 & - y_n y_2 & ... & y_n - y_n y_n \end{bmatrix} = \operatorname{diag}(y) - y \times y^T $$


method forward

forward(inputs, feedbacks=None)

method stable_softmax

stable_softmax(inputs, axis)

Numerical-stable softmax

Since some exponentials are quite large numbers, we can use

$$ y_i(x) = \frac{\exp(x_i)}{\sum_j \exp(x_j)} = \frac{\exp(x_i - m)}{\sum_j \exp(x_j - m)} $$

where $m = \max(x_1, x_2, ...)$.


method step

step(lr=0.01)