2013-01-25 v0.10 - the RelativeValueIteration class has been completed and fulfils the requirements to bump up the version number. >>> import mdp >>> P, R = mdp.exampleForest() >>> rvi = mdp.RelativeValueIteration(P, R) # this algorithm does not use discounting >>> rvi.iterate() # runs the algorithm >>> rvi.policy # to get the optimal policy (0, 0, 0) Pre 2013-01-25 v0.9 - the value iteration Gauss-Seidel algorithm is now in working order. The class ValueIterationGS should be stable and usable. Use like this: >>> import mdp >>> P, R = mdp.exampleRand(10, 3) # to create a random transition and reward matrices with 10 states and 3 actions >>> vigs = mdp.ValueIterationGS(P, R, 0.9 # assuming a discount rate of 0.9 >>> vigs.iterate() # to run the algorithm, then type vigs.policy after it has finished to see the optimal policy v0.8 - The policy iteration algorithm port is now usable. The PolicyIteration class should now be stable. To use: >>> import mdp >>> P, R = mdp.exampleForest() # to use the forest example as the transition and reward matrices >>> pi = mdp.PolicyIteration(P, R, 0.9) >>> pi.iterate # now typing pi.policy will show the optimal policy