159 words
1 minutes
[CS5242] Shallow Neural Networks

The Perceptron#

Perceptron model

  • algorithm to perform image classification

Multi-layer Perceptron (MLP)#

mlp

  • with fully connected layers

Gradient Descent#

  1. Predict
    • Forward pass: y~=wxi\tilde y = w x_i
    • Compute loss: Li=12(yiy~)2L_i = \frac {1}{2}(y_i- \tilde y)^2 (L2 loss function)
  2. Update
    • Backward propogation: Liw=(yiy~)xi=w\frac{\partial L_i}{\partial w} = -(y_i - \tilde y) x_i = ▽w
    • Gradient update: w=wŋ▽ww = w - ŋ▽w

Under/ Over fitting#

fitting

  • Underfitting:
    • Low model complexity => too simple to fit data
    • High bias (constently learn wrong thing)
  • Overfitting:
    • High model complexity => fits too well to seen data
    • High variance (amount of parameter change for different training data)
    • Cannot generalize onto new unseen data)

Data Splitting#

  • Training set for model training
  • Validation set for model tuning
  • Testing set for testing the final model

K-fold cross validation#

  • if only few training data
  • Divide training set into KK partitions
  • K1K-1 for training, 11 for validation
  • Repeat kk times

Cross entropy loss function#

cross_entropy

  • for ** classification**
  • Entropy: the degree of randomness (uncertainty)
    • H(x)=p(x)logp(x)H(x) = - \sum p(x) log p(x)
    • the grater the more uncertain
  • L=1n[j=1N[tjlog(pj)+(1tj)log(1pj)]]L = \frac {1}{n} [\sum_{j=1}^N [t_j log(p_j) + (1-t_j) log (1-p_j)]], tjt_j: the true value 0,1{0,1}
    • if y=1y=1 => L=log(p)L=-log(p)
    • if y=0y=0 => L=log(1p)L=-log(1-p)
[CS5242] Shallow Neural Networks
https://itsjeremyhsieh.github.io/posts/cs5242-2-shallow-neural-networks/
Author
Jeremy H
Published at
2024-08-20