Friday, December 23, 2011

Computational Economics Lecture 14


Application: Finite State Optimal Growth

This lecture:
  • Formulate a simple, finite state optimal growth problem
  • Solve by value iteration
The lecture is based on section 5.1 of EDTC
  • Please refer to that section for detailed explanations
  • You need some familiarity with dynamic programming to read this lecture
    • And a bit more mathematics than the shortest path application

Outline of the Problem

Model behavior of Colonel Kurtz, who can be found on small island in the Nung river living on catfish
Catfish bite at dawn, colonel bags a random quantity W
  • A random variable on 0,...,B with distribution q
Catfish last only if refrigerated, and the Colonel's freezer holds up to M fish
Let Xt be the state variable: stock of fish at noon on day t
In the evening, the Colonel consumes quantity Ct, and freezes Rt = Xt - Ct
Following the next morning's fishing trip, the state is


Assume that Kurtz follows policy function g: on observing state X, saves R = g(X)
State space (possible values for X) is S = 0,...,M+B
Policy function g maps S into 0,...,M, and


Denote the set of all feasible policies by G. For given g in G, state evolves according to


Letting U be the utility function, optimization problem of Colonel Kurtz is


Given that we are currently at state x, the reward from following policy g is


The value function is defined as


The precise definition of optimality is as follows:


The Bellman Equation

Set of all feasible actions (number of fish that can be frozen) when the current state is x:


Bellman equation can be written as


The idea is as follows
Suppose we know the values of different states in terms of maximum future rewards
Then best action is found by trading off the two effects inherent in choosing an action:
  • Current reward, and
  • Future reward after transitioning to new state next period
    • New state depends on current action
Making this trade off optimally gives maximum value from the current state
Given a real-valued function w on S, we call g in G w-greedy if


It is known that a policy is optimal if and only if it is v*-greedy
Hence computing an optimal policy is trivial if we know the value function
How to solve for the value function?
Let bS be the real-valued functions on S
Define the Bellman operator sending bS into bS by


Facts (see the text):
  • T is a uniform contraction on bS with respect to the distance


  • v is the unique fixed point of T in bS*
In consequence, iteration of T on any v in bS converges to v*

Value iteration algorithm

Iterate with T on some initial v in bS, producing (Tnv)
Continue iteration until


Computer a w-greedy policy g, where w is the final iterate
If the tolerance is small, then g is almost optimal (see the text)

Exercises

Solve for the optimal policy
  • Implement the Bellman operator T as a Python function
    • Elements of bS can be represented as lists or tuples
      • v[0] corresponds to v(0), etc.
    • Takes such a sequence v as an argument and
    • returns another sequence representing Tv
  • Write a function greedy() to compute greedy policies
  • Implement the loop as a function main()
    • Use 0.001 for the tolerance
For the utility function set


For the parameters, use beta, rho, B, M = 0.5, 0.9, 10, 5
Let q be uniform on 0,...,B
Plot the sequence of functions created in the value iteration algorithm


Print the greedy policy (as a list) calcuated from the last iterate
The solution is
>>> g
[0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 5, 5]

Solution

import pylab

beta, rho, B, M = 0.5, 0.9, 10, 5
S = range(B + M + 1)  # State space
Z = range(B + 1)      # Shock space

def d_infty(v, w):
    return max(abs(v[x] - w[x]) for x in S)

def U(c):      
    "Utility function."
    return c**beta

def q(z):    
    "Probability mass function, uniform distribution."
    return 1.0 / len(Z) if 0 <= z <= B else 0

def Gamma(x):  
    "The correspondence of feasible actions."
    return range(min(x, M) + 1)

def T(v):      
    """An implementation of the Bellman operator.
    Parameters: v is a sequence representing a function defined on S.
    Returns: Tv, a list."""
    Tv = []        
    for x in S:
        vals = []  # Stores rewards for a in Gamma(x)
        for a in Gamma(x):
            y = U(x - a) + rho * sum(v[a + z] * q(z) for z in Z)
            vals.append(y)
        Tv.append(max(vals))
    return Tv 

def greedy(v):
    g = []
    for x in S:
        runningmax = -1
        for a in Gamma(x):
            y = U(x - a) + rho * sum(v[a + z] * q(z) for z in Z)
            if y > runningmax:
                runningmax = y
                maximizer = a
        g.append(maximizer)
    return g
 
def main():
    current = [U(x) for x in S]
    tolerance = 0.001
    while 1:
        pylab.plot(current, 'bo-')
        new = T(current)
        if d_infty(new, current) < tolerance:
            break
        current = new
    g = greedy(new)
    print 'The optimal policy is:'
    print g
    pylab.show()

0 comments: