172 words
1 minutes
[CS5242] Regularization

Regularization#

  • modification intended to reduce generalization (validation) error but not training error
  • reduce variance significantly while not overly increase bias

Parameter Norm Penalties#

  • add penalty Ω(w)Ω(w) to regularize JJ
    • J(w;x,y)=J(w;x,y)+αΩ(w)J(w; x, y) = J(w; x, y) + \alpha Ω(w)
  • prevent overfitting
  • L1 norm regularization: Jreg=J+αw1J_{reg} = J + \alpha ||w||_1
  • L2 norm regularization: Jreg=J+α2w2J_{reg} = J + \frac {\alpha}{2} ||w||_2

Parameter sharing#

  • force the set of parameter to be equal
  • only a subset of the parmeters need to be stored in memory
  • e.g CNN

Dropout#

  • Training: randomly set some neurons to 00 with probability pp
    • can be different each iteration
  • computational cheap
  • hidden layer must perform well regardless of other hidden units

Data augmentation#

  • synthesize data by transforming the original dataset
  • incorporate invariances
  • e.g data augmentation

Injecting noise#

  • add noise to simulate real-world situation
    • Input: train with noise injected data
    • Hidden units: noise inject to hidden layers
    • Weights: increase stability for learned functions and parameters. -e.g RNN

Batch normalization (BN)#

  • reduces internal covariate shift by normalizaing inputs of each layer
  • normalize → scale and shift
    • xx^=xµσy=γx^+βx → \hat x = \frac {x-µ}{σ} → y= \gamma \hat x + \beta
      µµ: average, σσ: standard deviation
      γ\gamma: scale, β\beta: shift
    • γ\gamma and β\beta are parameters to learn
  • BN
[CS5242] Regularization
https://itsjeremyhsieh.github.io/posts/cs5242-5-regulatization/
Author
Jeremy H
Published at
2024-09-10