Skip to content

Latest commit

 

History

History
26 lines (22 loc) · 2.31 KB

DeepLearnAdvancedList.md

File metadata and controls

26 lines (22 loc) · 2.31 KB

The list of techniques to apply in Deep Learning keeps increasing everyday. Rediscovering older papers of discovering new ones provides us with a constant flow of modern techniques. To keep it simple, I'm saving a list of things to check out when seeking the best performance. Basically, this is a list of terms you should be familiar with. It is well known at the beginning, but it gets mysterious later on.

On Layers

  • Dropout: Switch of random neurons during training.
  • Batch Normalization: Normalize input of a layer.
  • ReLU: Rectified Linear Units, an activation function that avoids gradient explosion.
  • ELU/PReLU/lReLU: Variants of ReLU, Exponential linear unit, parametrized ReLU and leaky ReLU.
  • No more One Hot Encoding: link Decompose categorical data into vectors instead of single numbers.

On Architecture

  • Transfer Learning: Take pretrained networks or parts of them as a base model for your training.
  • Neural Architecture Search: Use ML to search for the best architecture for your Deep Learning task.

On Learning Process

  • L1 & L2 Regularization: keep your weights small in your cost function.
  • AdamW: Adam weight Decay link, better than Adam optimizer.
  • Gradient Clipping: Do not allow the gradient to go beyond a certain threshold.
  • Differential Learning rates: Use different learning rates for each set of layers (small for first layers, bigger for last).
  • Cyclical Learning Rates: link Search for the perfect learning rate by applying an exponentially increasing rate and keeping minimum loss.
  • Cosine annealing: Gradual decrease of learning rate using cosine function.
  • SDG with restarts: link restart the learning rate every several epochs.

On Training techniques:

  • Several Sizes: Train first on small, then on bigger images/text/..., shown in Yolov3.
  • Data Augmentation: Create modified copies of your training data. --> SELF Supervised learning to train a model to augment data (mostly images) link
  • TTA: test time augmentation, you run several modifications of an input (image?) over the network and average them as the output.