186 words
1 minutes
[CS5446] Human-Guided AI Decision Making
How can human guide AI the make decisions?
Judgemental Decision Making
- complements evidence-based and utility-based approaches
- account for human heuristics and cognitive bias
NOTEHuman is not always rational
Invorporation emothinal factors
- : disappointment factor
- balancing rationality and emotion
Human AI value alignment
- to handle uncertain preferences in a decision model
- add a new latent variable to represent unknown preference
- add an appropriatesensor model
Appreticeship Learning
Imitating learning
- learn directly from (expert demonstrations)
- supervised learning:
- : loss function between predicted action and expert action
- policy: mapping from state to action
Behavior cloning
- train a stochastic policy to imitate expert behavior
DAgger (Data Aggregation)
- chat with expert when encounter unfamiliar state
Inverse Reinforcement Learning
- recover reward function where the expert’s policy achieves a higher expected reward than other policies
- Bayesian approach
- suppose data is observed, be the hypothesis that is the true reward function
- (feature vector)
Maximum Margin IRL
- iteratively adjust to align learner’s feater expectations with that of expert’s
Generative Adversarial Imiration Learning
- combine Generative Adversarial Networks (GAN) and IRL to learn policies without reward function estimation
[CS5446] Human-Guided AI Decision Making
https://itsjeremyhsieh.github.io/posts/cs5446-10-human-guided-ai-decision-making/