Label Smoothing

  • A type of regularization
  • Change hard labels, one hot to smoother labels \(\epsilon\) is small. \(\(\large [0, 0, 1, 0] \rightarrow [\epsilon / 3, \epsilon / 3, 1 - \epsilon, \epsilon /3 ]\)\)

Because of softmax, we cannot really get an output that is one-hot encoded. But rather we push the model into learning (before activation) final layer output of the form \([-\infty, -\infty, \infty, -\infty]\)
- For this, weights become very large and this can cause model to over-fit - We might need to change the loss function for this