402 words
2 minutes
[CS5340] Monte Carlo Sampling

Motivation#

intractable integration problems

  • Normalization: p(xy)=p(yx)p(x)xp(yx)p(x)dxp(x|y) = \frac {p(y|x)p(x)}{\int_x p(y|x')p(x')dx'}
    • xp(yx)p(x)dx\int_x p(y|x')p(x')dx' is intractable to compute in high dimension
  • Marinalization: p(xy)=zp(x,zy)dzp(x|y) = \int_z p(x, z|y)dz
  • Expectation: Ep(xy)[f(x)]=xf(x)p(xy)dxE_{p(x|y)}[f(x)] = \int_x f(x)p(x|y)dx

Weak law of large numbers#

  • xˉnpμ\bar x_n \xrightarrow p \mu when nn \rightarrow \infty
    • xˉn\bar x_n: sample average =1nnxn= \frac {1}{n} \sum_n x_n
    • μ\mu: real average =xp(x)dx= \int x p(x) dx

Strong law of large numbers#

  • xˉna.s (converges)μ\bar x_n \xrightarrow {a.s \ (converges)} \mu when nn \rightarrow \infty
    • xˉn\bar x_n: sample average =1nnxn= \frac {1}{n} \sum_n x_n
    • μ\mu: real average =xp(x)dx= \int x p(x) dx

Non parametric representation#

  • represent with drawing samples
  • the higher probability of an interval, the more samples are drawn in the intervalf

Rejection Sampling#

rejection sampling

  • p~(z)\tilde p(z): target
  • q~(z)\tilde q(z): proposal
  • set kk so that kq~(z)p~(z)k \tilde q(z) \geq \tilde p(z)

Limitation#

  • not always possible for kk to cover all p~(z)\tilde p(z)
  • if kk too large, probability of zz is accepted =Pr(u<p~(z)kq~(z))=1k= Pr(u < \frac {\tilde p(z)}{k \tilde q(z)}) = \frac {1}{k}

Importance Sampling#

  • approximating expectations of a function f(z)f(z) w.r.t p(z)p(z) (target) samples from q(z)q(z) (proposal)
  • E[f]=f(z)p(z)dz=f(z)p(z)q(z)q(z)dz1LLq(zl)p(zl)f(zl)E[f] = \int f(z)p(z) dz = \int f(z) \frac {p(z)}{q(z)} q(z) dz \approx \frac {1}{L} \sum{L} \frac {q(z^l)}{p(z^l)} f(z^l)
  • therefore importance weight rl=p(zl)q(zl)r_l = \frac {p(z^l)}{q(z^l)}
  • but p(zl)p(z^l) is hard to get , p~(zl)\tilde p(z^l) is easy to get (unnormalized)
  • E[f]=ZqZp1LlLr~lf(zl)\Rightarrow E[f] = \frac {Z_q}{Z_p} \frac {1}{L} \sum_l^L \tilde r_l f(z^l), r~l=p~(zl)q~(zl)\tilde r_l = \frac {\tilde p(z^l)}{\tilde q(z^l)}
  • RightarrowE[f]=lLwlf(zl)Rightarrow E[f] = \sum_l^L w_l f(z^l), wl=p~(zl)q(zl)mp(zm)q(zm)w_l = \frac {\frac {\tilde p(z^l)}{q(z^l)}}{\sum_m \frac {p(z^m)}{q(z^m)}}

Limitation#

  • not able to cover everything in the target distribution
  • weight may be dominated by a few having large numbers

Ancestral Sampling#

  • all weights =1=1
  • starts with root nodes (easy to sample)
  • troublesome if loop \rightarrow randomly initialize, but may not converge

MCMC#

  • spend more time in the most important regions.
  • use adaptive proposals: use q(xx)q(x'|x) instead of q(x)q(x')
    • xx: the previous sample
    • xx': the new sample
  • metropolis_hasting

Burn-In period#

  • initial samples may be useles
  • discard the first few samples (e.g 1000)

Markov Chain properties#

  1. Homogeneous: fixed transition matrix
  2. Stationary & limiting distribution: πT=π\pi T = \pi (will eventually converge)
  3. Irreducibility: at any state, there’s a probability >0>0 to visit all other states (will not get stuck at a state)
  4. Aperiodicity: won’t get trapped in a cycle
  5. Ergodic: Irreducibility + Aperiodicity. MCMC algorithms must satisfy ergodic
  6. Detailed balance (Reversibility): πaTab=πbTba\pi_a T_{ab} = \pi_b T_{ba}
  • Metropolis-Hasting works because it satisfies detailed balance

Metropolis Algorithm#

  • a special case of the Metropolis-Hasting
  • random walk
  • acceptance probability AA(x,x)=min{1,p~(x)p~(x)}AA(x',x) = min \{1, \frac {\tilde p(x')}{\tilde p(x)} \}

Gibbs Sampling#

  • a special case of the Metropolis-Hasting
  • acceptance probability is always 1
  • gibbs
  • condition on all other nodes
  • use Markov Blanket (parents, childrens & co-parents) or 1st1^{st} degree neighbor for MRF

After we have samples:#

  • P(B)P(B): count # of B=0B=0 and B=1B=1
  • becomes statistic
[CS5340] Monte Carlo Sampling
https://itsjeremyhsieh.github.io/posts/cs5340-9-mcmc/
Author
Jeremy H
Published at
2024-10-21