470 words
2 minutes
[CS5446] Rational Decision Making

Decision Making under Uncertainty#

  • Decision Algorithm
    • Input: a problem
    • Output: a solution (policy) that specifies best action in each state wrt. values
  • Types of decision theory
    1. Normative decision theory: describes how ideal, rational agents should behave
    2. Descriptive decision theory: describes how actual agents aka humans really behave
    3. Prescriptive decision theory: guidelines for agents to behave rationally
  • environment: episodic, non-deterministic, partially observable
  • try to maximize gain
  • Decision Model:
    • Actions: aAa \in A
    • States: sSs \in S with probability of reaching: P(s)P(s)
    • Transition model: P(ss,a)P(s'|s, a) probability that action aa in state ss reaches state ss'
    • Result: Result(a)Result(a)
    • Probability of outcome state ss': P(Result(a)=s)=sP(s)P(ss,a)P(Result(a) = s') = \sum_{s} P(s)P(s'|s,a)
    • Utility function: U(s)U(s) express the desirability of a state ss
  • rational: to maximize the maximum expected utility (MEU)
  • EU of action: average of utility value for all outcomes, weighted by probability
    • EU(a)=sP(Result(a)=s)U(s)=ssP(s)P(ss,a)U(s)EU(a) = \sum_{s'}P(Result(a) = s')U(s') = \sum_{s'} \sum_{s} P(s)P(s'|s, a)U(s')

Axioms of Utility#

  • notation:
    • ABA ≻ B: agent prefers A to B
    • ABA ∼ B: agent is indifferent between A and B
    • ABA ≽ B: agent prefers A over B or is indifferent between them
  • 6 rules
    • Orderability: must be one of ABA ≻ B, ABA∼B, or BAB ≻ A
    • Transitivity: If ABBCA ≻ B ∧ B ≻ C => ACA ∧ C
    • Continuity: If ABCA ≻ B ≻ C => there exist some probability pp for which the agent will be indifferent between
      1. getting BB for sure
      2. getting the lottery that gets AA with probability pp and CC with probability 1p1-p
      • Continuity
    • Substitutability: If ABA ∼ B => [p,A;1p,C][p,B;1p,C][p, A; 1-p, C] ∼ [p, B; 1-p, C]
    • Monotonicity: If ABA ≻ B and p>qp > q => [p,A;1p,B][q,A;1q,B][p, A; 1-p, B] ≻ [q, A; 1-q, B]
    • Decomposibility:
      • Decomposibility
  • 𝑈(𝐴)>𝑈(𝐵)𝐴𝐵𝑈(𝐴) > 𝑈(𝐵) ⟺ 𝐴 ≻ 𝐵, 𝑈(𝐴)=𝑈(𝐵)𝐴𝐵𝑈(𝐴) = 𝑈(𝐵) ⟺ 𝐴 ∼ 𝐵
NOTE

Agent’s behavior doesn’t change if UU is subjected to an affine transformation
U(s)=aU(s)+bU'(s) = aU(s) + b with a>0a > 0

Utility function#

  • encode preferences
  • translate “desirability” measures xx into utility units U(x)U(x)

Preference elicitation methods#

How to get the utility?

  1. Probability equivalent
    • set u=0;uτ=1u_{\perp} = 0; u_{\tau} = 1
    • find pp s.t U(s)[p,uτ;1p,u]U(s)=pU(s) ∼ [p, u_{\tau} ; 1-p, u_{\perp}] \Rightarrow U(s) = p
    • 最好的設成1,最差的設成0,如果選擇ss的機率等於選擇[p[p機率是最好的,(1p)(1-p)機率是最差的]],那麼選擇ss的utility是pp, U(s)=pU(s) = p
  2. Certainty Equivalent (CE)
    • how much $ the lottery is equivalent to in your mind.
    • U(CE)=EU(Lottery)U(CE) = EU(Lottery)
    • CE example

Expected Monetary Value (EMV)#

  • use money as decision objective
  • doesn’t take into account risk attitude
  • e.g. Win 1 million so far. A lottery: 50% loss all, 50% add 1.5 million. Play?
    [0.5,0;0.5,2.5]=1.25>1\Rightarrow [0.5, 0; 0.5, 2.5] = 1.25 > 1, Play!

Risk Attitude and Risk Premium#

  • Risk_premium = EMV - CE
    • how much money are you willing to buy the lottery?
  • Risk-averse: EMV>CEEMV > CE, RiskPremium>0RiskPremium > 0 (不想吃虧)
  • Risk-seeking: EMV<CEEMV < CE, RiskPremium<0RiskPremium < 0 (賭下去就對了)
  • Risk neutral: EMV=CEEMV = CE, RiskPremium=0RiskPremium = 0
[CS5446] Rational Decision Making
https://itsjeremyhsieh.github.io/posts/cs5446-3-rational-decision-making/
Author
Jeremy H
Published at
2024-08-28