HISTORY 1.16 KB
Newer Older
1 2 3 4 5 6 7 8 9 10
2013-01-25
v0.10 - the RelativeValueIteration class has been completed and fulfils the requirements to bump up the version number.
	>>> import mdp
	>>> P, R = mdp.exampleForest()
	>>> rvi = mdp.RelativeValueIteration(P, R) # this algorithm does not use discounting
	>>> rvi.iterate() # runs the algorithm
	>>> rvi.policy # to get the optimal policy
	(0, 0, 0)

Pre 2013-01-25
Steven Cordwell's avatar
Steven Cordwell committed
11 12 13 14 15 16
v0.9 - the value iteration Gauss-Seidel algorithm is now in working order. The class ValueIterationGS should be stable and usable. Use like this:
	>>> import mdp
	>>> P, R = mdp.exampleRand(10, 3) # to create a random transition and reward matrices with 10 states and 3 actions
	>>> vigs = mdp.ValueIterationGS(P, R, 0.9 # assuming a discount rate of 0.9
	>>> vigs.iterate() # to run the algorithm, then type vigs.policy after it has finished to see the optimal policy

17 18 19 20 21
v0.8 - The policy iteration algorithm port is now usable. The PolicyIteration class should now be stable. To use:
	>>> import mdp
	>>> P, R = mdp.exampleForest() # to use the forest example as the transition and reward matrices
	>>> pi = mdp.PolicyIteration(P, R, 0.9)
	>>> pi.iterate # now typing pi.policy will show the optimal policy