368 words
2 minutes
[CS5340] Hidden Markov Models

HMM#

  • an extension of mixture model
  • choice of mixture component depends on the previous observation
  • latent variables {Z1,...,Zn1,Zn,...}\{ Z_1, ..., Z_{n-1}, Z_n, ... \} form a Markov Chain, where the current state ZnZ_n is dependent on the previous state Zn1Z_{n-1}
    • Zn+1Zn1ZnZ_{n+1} \perp Z_{n-1} | Z_n hmm
    • p(x1,...,xN,Z1,...ZN)=p(Z1)n=2Np(ZnZn1)n=1Np(XnZn)p(x_1, ..., x_N, Z_1, ... Z_N) = p(Z_1) \prod_{n=2}^N p(Z_n|Z_{n-1}) \prod_{n=1}^N p(X_n|Z_n)

Transition probabilities#

  • 1-of-K coding
  • describe which component of the mixture model is responsible for generating the corresponding xnx_n
  • Zn0,1KZ_n \in {0, 1}^K, p(ZnZn1)p(Z_n | Z_{n-1}) is a K×KK \times K matrix. (AA)
  • Ajk=p(Zn,k=1Zn1,j=1)A_{jk} = p(Z_{n,k} = 1| Z_{n-1, j}=1)
    • AjkA_{jk}: state jj to state kk
    • Zn,k=1Z_{n,k}=1: ZnZ_n is state kk
    • Zn1,j=1Z_{n-1, j}=1: Zn1Z_{n-1} is state jj
    • 0Ajk1, KAjk=10 \leq A_{jk} \leq 1, \ \sum_K A_{jk}=1
  • p(ZnZn1,A)=k=1Kj=1KAjkZn1,j×Zn,kp(Z_n|Z_{n-1},A) = \prod_{k=1}^K \prod_{j=1}^K A_{jk}^{Z_{n-1,j} \times Z_{n,k}}
    • ZnZ_n is state kk and Zn1Z_{n-1} is state jj
  • p(Z1π)=k=1KπkZ1,kp(Z_1 | \pi) = \prod_{k=1}^K \pi_k^{Z_1,k} where Kπk=1\sum_K \pi_k = 1
    • πk\pi_k: probability of categorical distribution

Emission probability#

  • p(XnZn,ϕ)p(X_n | Z_n, \phi)
    • ϕ\phi: parameters governing the distribution
    • p(xnZn,ϕ)=k=1Kp(xnϕ)nkZp(x_n | Z_n, \phi) = \prod_{k=1}^K p(x_n| \phi)^Z_{nk}
      • ZnZ_n is kk state, only one ZnkZ_{nk} will be 1

Homogeneous Model#

  • all conditional distributions governing latent variables share the same parameters AA
  • all emission distributions share the same parameters ϕ\phi

Maximum likelihood#

  • now we want to find the maximum likelihood p(xθ)=Zp(x,zθ)p(x| \theta) = \sum_Z p(x,z| \theta)
  • NO closed-form solutions
  • \Rightarrow use EM algorithm

EM algorithm on HMM#

  1. Initialization of parameters θold\theta^{old}
  2. E-step: find Q(θ,θold)=Zp(zx,θold)ln p(x,zθ)Q(\theta, \theta^{old}) = \sum_Z p(z| x, \theta^{old}) ln \ p(x, z| \theta)
    • we define:
      • r(zn)=p(znx,θold)r(z_n) = p(z_n | x, \theta^{old})
      • r(znk)r(z_{nk}): conditional probability of znk=1z_{nk} = 1
        • =E[znk]=znr(zn)znk= E[z_{nk}] = \sum_{z_n} r(z_n)z_{nk}
        • znk=1z_{nk}=1 when k=kk=k
      • ξ(zn1,zn)=p(zn1,znx,θold)\xi(z_{n-1}, z_n) = p(z_{n-1}, z_n | x, \theta^{old})
      • ξ(zn1,j,zn,k)=E[zn1,j,zn,k]=zn1,znξ(zn1,zn)zn1,jzn,k\xi(z_{n-1,j}, z_{n,k}) = E[z_{n-1,j}, z_{n,k}]= \sum_{z_{n-1}, z_n} \xi(z_{n-1}, z_n) z_{n-1,j} z_{n,k}
    • our goal is to compute goal
      • therefore to find r(zn)r(z_n) and ξ(zn1,j,zn,k)\xi(z_{n-1,j},z_{n,k})
      • r
      • xi
    • using the Forward-Backward algorithm to find α\alpha and β\beta
    • α(zn)=p(xnzn)zn1α(zn1)p(znzn1\alpha(z_n) = p(x_n|z_n) \sum_{z_{n-1}} \alpha(z_{n-1}) p(z_n|z_{n-1}
    • β(zn)=zn1β(zn+1)p(xn+1zn+1p(zn+1zn)\beta(z_n) = \sum_{z_{n-1}} \beta(z_n+1) p(x_{n+1}|z_{n+1} p(z_{n+1} |z_n)
    • α(z1,k)\alpha(z_{1,k}) for k=1,...,Kk=1, ... ,K =πkp(x1ϕk)=\pi_k p(x_1 | \phi_k)
    • β(zn)=1\beta(z_n) =1
  3. M-step: maximize Q(θ,θold)Q(\theta, \theta^{old}) w.r.t θ={π,A,ϕ}\theta = \{ \pi, A, \phi \}
    • πk=r(z1k)j=1Kr(z1j)\pi_k = \frac {r(z_{1k})}{\sum_{j=1}^K r(z_{1j})}
    • Ajk=n=2Nξ(zn1,j,zn,k)l=1Kn=2Nξ(zn1,jzn,l)A_{jk} = \frac {\sum_{n=2}^N \xi(z_{n-1,j}, z_{n,k})}{\sum_{l=1}^K} \sum_{n=2}^N \xi(z_{n-1,j} z_{n,l})

Scaling Factor (for normalization)#

  • values of α(zn)\alpha(z_n) can go to 00 exponentially quickly
  • α^(zn)=p(znx1,...,xN)=α(zn)p(x1,...,xN)=α(zn)Cn\hat \alpha(z_n) = p(z_n| x_1, ..., x_N) = \frac {\alpha(z_n)}{p(x_1,..., x_N)} = \frac {\alpha(z_n)}{C_n}
    • Cn=p(x1,...,xN)C_n = p(x_1,..., x_N)
  • β^(zn)=β(zn)Cn+1\hat \beta(z_n) = \frac {\beta(z_n)}{C_{n+1}}
  • \Rightarrow
    • r(zn)=α^(zn)β^(zn)r(z_n) = \hat \alpha(z_n) \hat \beta(z_n)
    • ξ(zn1,zn)=Cn1α^(zn1)p(xnzn)p(znzn1)β^(zn)\xi(z_{n-1}, z_n) = C_n^{-1} \hat \alpha(z_{n-1}) p(x_n | z_n) p(z_n | z_{n-1}) \hat \beta(z_n)

Viterbi Algorithm#

  • simply the max-sum algorithm performed on HMM
[CS5340] Hidden Markov Models
https://itsjeremyhsieh.github.io/posts/cs5340-8-hidden-markov-models/
Author
Jeremy H
Published at
2024-10-14