Open Mind Tree: Computational Economics Lecture 14

Application: Finite State Optimal Growth

This lecture:

Formulate a simple, finite state optimal growth problem
Solve by value iteration

The lecture is based on section 5.1 of EDTC

Please refer to that section for detailed explanations
You need some familiarity with dynamic programming to read this lecture
- And a bit more mathematics than the shortest path application

Outline of the Problem

Model behavior of Colonel Kurtz, who can be found on small island in the Nung river living on catfish

Catfish bite at dawn, colonel bags a random quantity W

A random variable on 0,...,B with distribution q

Catfish last only if refrigerated, and the Colonel's freezer holds up to M fish

Let X_t be the state variable: stock of fish at noon on day t

In the evening, the Colonel consumes quantity C_t, and freezes R_t = X_t - C_t

Following the next morning's fishing trip, the state is

Assume that Kurtz follows policy function g: on observing state X, saves R = g(X)

State space (possible values for X) is S = 0,...,M+B

Policy function g maps S into 0,...,M, and

Denote the set of all feasible policies by G. For given g in G, state evolves according to

Letting U be the utility function, optimization problem of Colonel Kurtz is

Given that we are currently at state x, the reward from following policy g is

The value function is defined as

The precise definition of optimality is as follows:

The Bellman Equation

Set of all feasible actions (number of fish that can be frozen) when the current state is x:

Bellman equation can be written as

The idea is as follows

Suppose we know the values of different states in terms of maximum future rewards

Then best action is found by trading off the two effects inherent in choosing an action:

Current reward, and
Future reward after transitioning to new state next period
- New state depends on current action

Making this trade off optimally gives maximum value from the current state

Given a real-valued function w on S, we call g in G w-greedy if

It is known that a policy is optimal if and only if it is v^*-greedy

Hence computing an optimal policy is trivial if we know the value function

How to solve for the value function?

Let bS be the real-valued functions on S

Define the Bellman operator sending bS into bS by

Facts (see the text):

T is a uniform contraction on bS with respect to the distance

v is the unique fixed point of T in bS*

In consequence, iteration of T on any v in bS converges to v^*

Value iteration algorithm

Iterate with T on some initial v in bS, producing (Tⁿv)

Continue iteration until

Computer a w-greedy policy g, where w is the final iterate

If the tolerance is small, then g is almost optimal (see the text)

Exercises

Solve for the optimal policy

Implement the Bellman operator T as a Python function
- Elements of bS can be represented as lists or tuples
  - v[0] corresponds to v(0), etc.
- Takes such a sequence v as an argument and
- returns another sequence representing Tv
Write a function greedy() to compute greedy policies
Implement the loop as a function main()
- Use 0.001 for the tolerance

For the utility function set

For the parameters, use beta, rho, B, M = 0.5, 0.9, 10, 5

Let q be uniform on 0,...,B

Plot the sequence of functions created in the value iteration algorithm

Print the greedy policy (as a list) calcuated from the last iterate

The solution is

>>> g
[0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 5, 5]

Solution

import pylab

beta, rho, B, M = 0.5, 0.9, 10, 5
S = range(B + M + 1)  # State space
Z = range(B + 1)      # Shock space

def d_infty(v, w):
    return max(abs(v[x] - w[x]) for x in S)

def U(c):      
    "Utility function."
    return c**beta

def q(z):    
    "Probability mass function, uniform distribution."
    return 1.0 / len(Z) if 0 <= z <= B else 0

def Gamma(x):  
    "The correspondence of feasible actions."
    return range(min(x, M) + 1)

def T(v):      
    """An implementation of the Bellman operator.
    Parameters: v is a sequence representing a function defined on S.
    Returns: Tv, a list."""
    Tv = []        
    for x in S:
        vals = []  # Stores rewards for a in Gamma(x)
        for a in Gamma(x):
            y = U(x - a) + rho * sum(v[a + z] * q(z) for z in Z)
            vals.append(y)
        Tv.append(max(vals))
    return Tv 

def greedy(v):
    g = []
    for x in S:
        runningmax = -1
        for a in Gamma(x):
            y = U(x - a) + rho * sum(v[a + z] * q(z) for z in Z)
            if y > runningmax:
                runningmax = y
                maximizer = a
        g.append(maximizer)
    return g
 
def main():
    current = [U(x) for x in S]
    tolerance = 0.001
    while 1:
        pylab.plot(current, 'bo-')
        new = T(current)
        if d_infty(new, current) < tolerance:
            break
        current = new
    g = greedy(new)
    print 'The optimal policy is:'
    print g
    pylab.show()

Open Mind Tree

Computational Economics Lecture 14

Application: Finite State Optimal Growth

Outline of the Problem

The Bellman Equation

Value iteration algorithm

Exercises

Solution

0 comments:

Popular Posts

Visitors

Archives

Infolinks In Text Ads

Featured Posts

Blogger Tips