255 words
1 minutes
[CS5242] Convolution Neural Networks (CNN)

Convolution#

  • Linear transformation
  • Feature extraction
  • 1D: Text, 2D: Image, 3D: microscopy …
  • convolution

Why convolution?

  • Sparse connection: each output only connect to inputs in reception field
    • fewer parameters, less overfitting
  • Weight sharing (use the same kernel weight)
    • regularization, less overfitting
  • Location/ Spatial invariant
    • same prediction for same input no matter where

1D Convolution#

  • y=i=0k1wi×xt+iy = \sum_{i=0}^{k-1} w_i \times x_{t+i}
    • ww: kernel/ filter (weights to be trained)
    • xx: input
    • yy: output feature
    • applied area = receptive field: generate one output value
  • 1d_conv

Padding#

What to do with boundary cases?

  • determine edge values
  • we want all elements to be processed the same # of time
  • we also want output size = input size

pad with extra values (0)#

  • without padding: Output=inputkernal+1Output = input - kernal + 1
  • with padding: Output=(input+padding)kernal+1Output = (input + padding) - kernal + 1
  • => same padding: padding=kernal1padding = kernal - 1

Stride#

How to make training faster?

  • steps to skip
  • consecutive: stride = 1
  • s>1s>1: skip some elements
    • faster, efficive
  • Output=input+paddingkernalstride+1Output = \lfloor \frac {input + padding - kernal}{stride}\rfloor + 1
  • same padding:
    • input+paddingkernalstrideOutput1\lfloor \frac {input + padding - kernal}{stride}\rfloor ≥ Output - 1
      paddingstride×(Outputstride1)+kernalinputpadding ≥ stride \times (\lceil \frac {Output} {stride}\rceil - 1) + kernal - input
      => padding=max(stride×(Outputstride1)+kernalinput,0)padding = max(stride \times (\lceil \frac {Output} {stride}\rceil - 1)+ kernal - input, 0)

2D Convolution#

2D convolution

  • Shape:
    • input = nh×nwn_h \times n_w
    • kernal = kh×kwk_h \times k_w
    • output = Oh×OwO_h \times O_w = (nh+phkhsh+1,nw+pwkwsw+1)(\lfloor \frac {n_h+p_h-k_h}{s_h}\rfloor + 1, \lfloor \frac {n_w+p_w-k_w}{s_w}\rfloor + 1)
  • Computational cost: O(kw×kh×Ow×Oh)O(k_w \times k_h \times O_w \times O_h)
  • Multiple kernels: different levels of information (local vs. global)
  • Multiple channels: results for all channels are summed (e.g RGB)

Pooling#

What to do for reducing size?

  • Aggregrate information from each receptive field
    • average, maximum
  • No parameter
  • Applied for each channels respectively
  • Reduces feature/ model size
  • Invariant to rotation of input
  • e.g Average pooling average pooling
[CS5242] Convolution Neural Networks (CNN)
https://itsjeremyhsieh.github.io/posts/cs5242-4-convolution-neural-networks/
Author
Jeremy H
Published at
2024-09-03