The list of techniques to apply in Deep Learning keeps increasing everyday. Rediscovering older papers of discovering new ones provides us with a constant flow of modern techniques. To keep it simple, I'm saving a list of things to check out when seeking the best performance. Basically, this is a list of terms you should be familiar with. It is well known at the beginning, but it gets mysterious later on.
- Dropout: Switch of random neurons during training.
- Batch Normalization: Normalize input of a layer.
- ReLU: Rectified Linear Units, an activation function that avoids gradient explosion.
- ELU/PReLU/lReLU: Variants of ReLU, Exponential linear unit, parametrized ReLU and leaky ReLU.
- No more One Hot Encoding: link Decompose categorical data into vectors instead of single numbers.
- Transfer Learning: Take pretrained networks or parts of them as a base model for your training.
- Neural Architecture Search: Use ML to search for the best architecture for your Deep Learning task.
- L1 & L2 Regularization: keep your weights small in your cost function.
- AdamW: Adam weight Decay link, better than Adam optimizer.
- Gradient Clipping: Do not allow the gradient to go beyond a certain threshold.
- Differential Learning rates: Use different learning rates for each set of layers (small for first layers, bigger for last).
- Cyclical Learning Rates: link Search for the perfect learning rate by applying an exponentially increasing rate and keeping minimum loss.
- Cosine annealing: Gradual decrease of learning rate using cosine function.
- SDG with restarts: link restart the learning rate every several epochs.
- Several Sizes: Train first on small, then on bigger images/text/..., shown in Yolov3.
- Data Augmentation: Create modified copies of your training data. --> SELF Supervised learning to train a model to augment data (mostly images) link
- TTA: test time augmentation, you run several modifications of an input (image?) over the network and average them as the output.