192 words
1 minutes
[CS5242] Linear Regression

Univariate Linear Regression#

  • Maps from input to output
  • y~=wx+b\tilde y = wx + b
    • y~\tilde y: prediction output
    • ww: weight
    • xx: input
    • bb: bias
  • Loss funciton
    • to calculate the loss between y~\tilde y and yy
    • L1: L(x,yw,b)=y~yL(x, y| w, b) = |\tilde y - y|
    • L2: L(x,yw,b)=12y~y2L(x, y| w, b) = \frac{1}{2} |\tilde y - y|^2
    • Global loss J(w,b)=1mi=1mL(xi,yiw,b)J(w,b) = \frac{1}{m} \sum_{i=1}^m L(x^i, y^i | w, b) (data points are independent to each other)
    • minJ(w,b)=minw,b12mi=1m(wxi+byi)2min J(w,b) = min_{w,b} \frac{1}{2m} \sum_{i=1}^m (wx^i + b - y^i)^2
      • the 22 in 12m\frac{1}{2m} is for calculation convenience
    • Gradient descent
      • optimization
      • for each sample, compute y~=wx+b\tilde y = wx + b
      • compute average loss J(w,b)J(w,b)
      • compute Jw\frac{\partial J}{\partial w}
      • update w=wαJww = w - α \frac{\partial J}{\partial w}, αα is learning rate
      • update w,bw, b repeatedly

Multivariate Linear Regression#

  • features as column vector X=[x1x2...xn]X = \begin{bmatrix} x_1 \\ x_2 \\ ... \\ x_n \end{bmatrix}, xRn×1x \in R^{n \times 1}
  • y~=wTx+b,wRn×1,bR\tilde y = w^T x + b, w \in R^{n \times 1}, b \in R
    • =i=1mwixi+b=(w1,w2,...,wn)×[x1x2...xn]+b= \sum_{i=1}^m w_ix_i+b = (w_1, w_2, ..., w_n) \times \begin{bmatrix} x_1 \\ x_2 \\ ... \\ x_n \end{bmatrix} + b
  • Gradient of vector and metrix
    • J(w)=L(x,yw)=12(wTxy)2J(w) = L(x, y|w) = \frac{1}{2} (w^Tx-y)^2
    • let Z=wTxyZ = w^Tx-y, J(w)=12Z2J(w)=\frac{1}{2}Z^2
    • J(w)w=JZZw=Zx=(wTxy)x\frac{\partial J(w)}{\partial w} = \frac{\partial J}{\partial Z} \frac{\partial Z}{\partial w} = Zx = (w^Tx-y)x
[CS5242] Linear Regression
https://itsjeremyhsieh.github.io/posts/cs5242-1-linear-regression/
Author
Jeremy H
Published at
2024-08-13