100 words
1 minutes
[CS5242] Deep CNN Architectures

How do we initiate the weights to prevent vanish/ explode?

weight init

  • Forward: Var[y]=dndinVar[wd])Var[x]Var[y] = \prod \limits_d n_d^{in} Var[w_d]) Var[x]
  • Backward: Var[x]=dndoutVar[wd])Var[y]Var[\frac {\partial}{\partial x}] = \prod \limits_d n_d^{out} Var[w_d]) Var[\frac {\partial}{\partial y}]
  • initiate ndinVar[wd]=1n_d^{in} Var[w_d] = 1 or ndoutVar[wd]=1n_d^{out} Var[w_d] = 1
    • ReLU: 12ndinVar[wd]=1\frac {1}{2} n_d^{in} Var[w_d] = 1 or 12ndoutVar[wd]=1\frac {1}{2} n_d^{out} Var[w_d] = 1
  • init compare

Deep Residual Learning#

stack

We cannot simply stack layers, or we’ll have optimization difficulties

residual

  • F(x)=H(x)xF(x)=H(x)-x
  • F(x)F(x) just learn the change (residue)
  • RESNET-34 structure RESNET-34

Why use the shortcut?

  • Degrading problem
    • when the model gets deeper, it becomes more difficult for layers to propagate information from shallow layers and information is lost (degrade rapidly)
    • we use shortcuts (identity function)
[CS5242] Deep CNN Architectures
https://itsjeremyhsieh.github.io/posts/cs5242-6-deep-cnn-architecture/
Author
Jeremy H
Published at
2024-09-17