186 words
1 minutes
[CS5446] Human-Guided AI Decision Making

How can human guide AI the make decisions?

Judgemental Decision Making#

  • complements evidence-based and utility-based approaches
  • account for human heuristics and cognitive bias
NOTE

Human is not always rational

Invorporation emothinal factors#

  • EU(a)=sp(R(a)=s)U(s)D(a)EU(a) = \sum_{s'} p(R(a)=s') U(s') - D(a)
    • D(a)D(a): disappointment factor
  • balancing rationality and emotion

Human AI value alignment#

  • to handle uncertain preferences in a decision model
    • add a new latent variable to represent unknown preference
    • add an appropriatesensor model

Appreticeship Learning#

Imitating learning#

  • learn π(s)\pi(s) directly from DD (expert demonstrations)
  • supervised learning: π(s)=argminπil(π(si),ai)\pi(s) = argmin_\pi \sum_i l(\pi(s_i), a_i)
    • ll: loss function between predicted action π(si)\pi(s_i) and expert action aia_i
  • policy: mapping from state to action

Behavior cloning#

  • train a stochastic policy πθ(as)\pi_\theta(a|s) to imitate expert behavior
  • θ=argmaxθ(s,a)(aisi)=argminθ(s,a)l(πθ(si)ai)\theta^* = argmax_\theta \prod_{(s,a)} (a_i|s_i) = argmin_\theta \sum_{(s,a)} l(\pi_\theta(s_i) | a_i)

DAgger (Data Aggregation)#

  • chat with expert when encounter unfamiliar state

Inverse Reinforcement Learning#

  • recover reward function RR^* where the expert’s policy achieves a higher expected reward than other policies
  • Bayesian approach
    • suppose data dd is observed, hRh_R be the hypothesis that RR is the true reward function
    • P(hRd)=αP(dhR)p(hR)P(h_R|d) = \alpha P(d|h_R)p(h_R)
  • Rθ(s,a)=i=1nθifi(s,a)R_\theta(s,a) = \sum_{i=1}^n \theta_i f_i(s,a) (feature vector)

Maximum Margin IRL#

  • iteratively adjust θ\theta to align learner’s feater expectations with that of expert’s

Generative Adversarial Imiration Learning#

  • combine Generative Adversarial Networks (GAN) and IRL to learn policies without reward function estimation
[CS5446] Human-Guided AI Decision Making
https://itsjeremyhsieh.github.io/posts/cs5446-10-human-guided-ai-decision-making/
Author
Jeremy H
Published at
2024-10-30