From Andrej Karpathy's course cs231n:CNNs for Visual Recognition
All the plots were generated with one full forward pass across all the 10
layers of the network with the same activation function
There are 10
layers, each layer having 500
units.
Tanh, ReLU, Sigmoid were used.
Random data points of 1000
training examples are generated from a univariate "normal" (Gaussian) distribution of mean 0
and variance 1
.
Weights for each layer were generated from the same distribution as that of data points
but later on varied to obtain different plots.