299 words
1 minutes
[CS5340] Variation Autoencoder & Diffusion Model

Discriminative model#

  • posterior p(CkX)p(C_k | X)
  • map xx to CkC_k (class)

Generative model#

  • likelihood p(XCk)p(X|C_k)
  • sampling from the distribution
  • can generate data points in input space
    • p(x,zθ)=p(xz,θ)p(z)p(x,z| \theta) = p(x|z, \theta) p(z)
    1. sampling from the prior z p(z)z ~ p(z) e.g: z = Bedroom
    2. generate new image from likelihood p(xz=Bedroom)p(x| z = Bedroom)
  • define prior p(z)p(z), deterministic function f(z;θ):z×θxf(z ; \theta): z \times \theta \rightarrow x
  • optimize θ\theta: maximize likelihood p(xθ)=infp(xz,θ)p(z)dzp(x| \theta) = \inf p(x|z, \theta) p(z) dz, where p(xz,θ)=N(xf(z,θ),σ2I)p(x|z, \theta) = N(x | f(z, \theta), \sigma^2I)
    • but zz is high dimension, intractable
    • we suppose p(z)=N(0,I)p(z) = N(0,I)
    • learn f(z,θ)f(z, \theta) in deep neural network

f(z;θ):z×θxf(z ; \theta): z \times \theta \rightarrow x#

  • consider lnp(xz,θ)=lnN(xf(z;θ),σ2I)=xf(z;θ)2+const.ln p(x | z, \theta) = ln N(x| f(z; \theta), \sigma^2I) = - | x-f(z; \theta)|^2 + const.
  • lnp(x)=zq(zx)lnp(x)ln p(x) = \sum_z q(z|x) ln p(x) (encoder) =zq(zx)lnp(x,z)q(zx)= \sum_z q(z|x) ln \frac {p(x,z)}{q(z|x)} which is the lowerbound (L(q,θ)L(q, \theta)) +zq(zx)lnq(zx)p(zx)+ \sum_z q(z|x) ln \frac {q(z|x)}{p(z|x)} (is the KL[q(zx)p(zx)]0KL[q(z|x) || p(z|x)] \geq 0)

alt text alt text

  • but gradient =(lnp(xz))KL[q(zx)p(z)]]= \triangledown (ln p(x|z)) - KL[q(z|x) || p(z)]]
    • q(zx)q(z|x) is not trained

Reparameterization Trick#

alt text

Diffusion Model#

  • Encoder: data sample xx, map it through a series of intermediate latent variables. (add noise)
    • no learning
    • xZ1Z2...ZTx \rightarrow Z_1 \rightarrow Z_2 \rightarrow ... \rightarrow Z_T
  • Decoder: starting with ZTZ_T, map back to xx
    • deep network
    • ZTZT1...Z1xZ_T \rightarrow Z_{T-1} \rightarrow ... \rightarrow Z_1 \rightarrow x

Encoder (Diffusion)#

  • Z1=1β1×x+β1×ϵ1Z_1 = \sqrt {1- \beta_1} \times x + \sqrt {\beta_1} \times \epsilon_1 (noise from normal distribution)

  • Zt=1βt×Zt1+βt×ϵtZ_t = \sqrt {1- \beta_t} \times Z_{t-1} + \sqrt {\beta_t} \times \epsilon_t

  • q(Z1x)=Norm[1β1x,β1I]\Rightarrow q(Z_1|x) = Norm[\sqrt {1-\beta_1} x, \beta_1 I]
    q(Z1...Tx)=q(Z1x)t=1Tq(ZtZt1)q(Z_{1 ...T}|x) = q(Z_1|x) \prod_{t=1}^T q(Z_t|Z_{t-1})

  • we want to get ztz_t directly from xx

  • Zt=αt×x+qαt×ϵZ_t = \sqrt {\alpha_t} \times x + \sqrt {q-\alpha_t} \times \epsilon, αt=s=1t1βs\alpha_t = \prod_{s=1}^t 1-\beta_s

  • q(Ztx)=NormZt[αt×x,(qαt)I]q(Z_t|x) = Norm_{Z_t} [\sqrt{\alpha_t} \times x, (q - \alpha_t)I]

Decoder (learn the reverse process)#

  • approx Pr(ZT)=NormZT[0,T]Pr(Z_T) = Norm_{Z_T} [0,T]
    Pr(Zt1Zt,ϕt)=NormZt1[ft[Zt,ϕt],σt2I]Pr(Z_t-1| Z_t, \phi_t) = Norm_{Z_{t-1}} [f_t[Z_t, \phi_t], \sigma_t^2I]
    Pr(xZ1,ϕt)=Normx[f1[Z1,ϕ1],σ12I]Pr(x| Z_1, \phi_t) = Norm_{x} [f_1[Z_1, \phi_1], \sigma_1^2I]
  • alt text
  • final lost function: alt text
[CS5340] Variation Autoencoder & Diffusion Model
https://itsjeremyhsieh.github.io/posts/cs5340-11-variational-autoencoder-and-diffusion-models/
Author
Jeremy H
Published at
2024-11-04