2014-03-14
v4.0b2 - Second beta release, new tic tac toe example which needs testing for
correctness. Documentation is mostly complete.
2014-03-11
v4.0b1 - fist beta release created on the testing branch
2013-09-11
v4.0a4 - Package is useable with Python 3.3, used 2tp3 script to convert. Still
needs a bit of cleaning up.
2013-08-24
v4.0a3 - Converted the module into a package format, and renamed it so now it
must be imported with:
>>> import mdptoolbox
2013-02-06
v4.0a2 - Fixes a bug that didn't allow sparse matrices to be used in the
ValueIteration class.
2013-02-05
v4.0alpha1 - the documentation has started to take shape, and the modules
docstring is now in a stable condition. The module *should* be useful
for everyday use, but testing is incomplete, and some documentation is
still missing.
2013-01-26
v0.14 - the linear programming code has been finalised, however there is currently no way to turn of the progress status of the
lp solver from the cvxopt module. One unit test has been written. To use linear programming method:
>>> import mdp
>>> P, R = mdp.exampleForest()
>>> lp = mdp.LP(P, R, 0.9)
>>> lp.iterate() # and after some output from cvxopt.solvers.lp the optimal policy can be shown with lp.policy
2013-01-25
v0.13 - the FiniteHorizon class has been fixed. No unit tests yet.
>>> import mdp
>>> P, R = mdp.exampleForest()
>>> pim = mdp.FiniteHorizon(P, R, 10) # 10 time-step horizon
>>> pim.iterate() # get the optimal policy with pim.policy
v0.12 - PolicyIterationModified has been completed and is feature complete. Unit tests have not yet been written, and the docs need fixing.
>>> import mdp
>>> P, R = mdp.exampleForest()
>>> pim = mdp.PolicyIterationModified(P, R, 0.9)
>>> pim.iterate()
>>> pim.policy
(0, 0, 0)
v0.11 - QLearning is now ready for use in the module, with a couple of unit tests already written. The class is used as follows:
>>> import mdp
>>> import numpy as np
>>> P = np.array([[[0.5, 0.5],[0.8, 0.2]],[[0, 1],[0.1, 0.9]]])
>>> R = R = np.array([[5, 10], [-1, 2]])
>>> ql = mdp.QLearning(P, R, 0.9)
>>> ql.iterate()
>>> ql.policy
(1, 0)
v0.10 - the RelativeValueIteration class has been completed and fulfils the requirements to bump up the version number.
>>> import mdp
>>> P, R = mdp.exampleForest()
>>> rvi = mdp.RelativeValueIteration(P, R) # this algorithm does not use discounting
>>> rvi.iterate() # runs the algorithm
>>> rvi.policy # to get the optimal policy
(0, 0, 0)
Pre 2013-01-25
v0.9 - the value iteration Gauss-Seidel algorithm is now in working order. The class ValueIterationGS should be stable and usable. Use like this:
>>> import mdp
>>> P, R = mdp.exampleRand(10, 3) # to create a random transition and reward matrices with 10 states and 3 actions
>>> vigs = mdp.ValueIterationGS(P, R, 0.9 # assuming a discount rate of 0.9
>>> vigs.iterate() # to run the algorithm, then type vigs.policy after it has finished to see the optimal policy
v0.8 - The policy iteration algorithm port is now usable. The PolicyIteration class should now be stable. To use:
>>> import mdp
>>> P, R = mdp.exampleForest() # to use the forest example as the transition and reward matrices
>>> pi = mdp.PolicyIteration(P, R, 0.9)
>>> pi.iterate # now typing pi.policy will show the optimal policy