HISTORY.txt 3.27 KB
Newer Older
1 2 3 4
2014-03-14
v4.0b2 - Second beta release, new tic tac toe example which needs testing for
         correctness. Documentation is mostly complete.

Steven Cordwell's avatar
Steven Cordwell committed
5 6 7
2014-03-11
v4.0b1 - fist beta release created on the testing branch

Steven Cordwell's avatar
Steven Cordwell committed
8 9 10 11
2013-09-11
v4.0a4 - Package is useable with Python 3.3, used 2tp3 script to convert. Still
         needs a bit of cleaning up.

12 13 14 15 16
2013-08-24
v4.0a3 - Converted the module into a package format, and renamed it so now it
         must be imported with:
         >>> import mdptoolbox

Steven Cordwell's avatar
Steven Cordwell committed
17 18 19 20
2013-02-06
v4.0a2 - Fixes a bug that didn't allow sparse matrices to be used in the
        ValueIteration class.

21 22 23 24 25 26
2013-02-05
v4.0alpha1 - the documentation has started to take shape, and the modules
        docstring is now in a stable condition. The module *should* be useful
	for everyday use, but testing is incomplete, and some documentation is
	still missing.

27 28 29 30 31 32 33 34
2013-01-26
v0.14 - the linear programming code has been finalised, however there is currently no way to turn of the progress status of the
	lp solver from the cvxopt module. One unit test has been written. To use linear programming method:
	>>> import mdp
	>>> P, R = mdp.exampleForest()
	>>> lp = mdp.LP(P, R, 0.9)
	>>> lp.iterate() # and after some output from cvxopt.solvers.lp the optimal policy can be shown with lp.policy

35
2013-01-25
36 37 38 39 40 41
v0.13 - the FiniteHorizon class has been fixed. No unit tests yet.
	>>> import mdp
	>>> P, R = mdp.exampleForest()
	>>> pim = mdp.FiniteHorizon(P, R, 10) # 10 time-step horizon
	>>> pim.iterate() # get the optimal policy with pim.policy

42 43 44 45 46 47 48 49
v0.12 - PolicyIterationModified has been completed and is feature complete. Unit tests have not yet been written, and the docs need fixing.
	>>> import mdp
	>>> P, R = mdp.exampleForest()
	>>> pim = mdp.PolicyIterationModified(P, R, 0.9)
	>>> pim.iterate()
	>>> pim.policy
	(0, 0, 0)

50 51 52 53 54 55 56 57 58 59
v0.11 - QLearning is now ready for use in the module, with a couple of unit tests already written. The class is used as follows:
	>>> import mdp
	>>> import numpy as np
	>>> P = np.array([[[0.5, 0.5],[0.8, 0.2]],[[0, 1],[0.1, 0.9]]])
	>>> R = R = np.array([[5, 10], [-1, 2]])
	>>> ql = mdp.QLearning(P, R, 0.9)
	>>> ql.iterate()
	>>> ql.policy
	(1, 0)

60 61 62 63 64 65 66 67 68
v0.10 - the RelativeValueIteration class has been completed and fulfils the requirements to bump up the version number.
	>>> import mdp
	>>> P, R = mdp.exampleForest()
	>>> rvi = mdp.RelativeValueIteration(P, R) # this algorithm does not use discounting
	>>> rvi.iterate() # runs the algorithm
	>>> rvi.policy # to get the optimal policy
	(0, 0, 0)

Pre 2013-01-25
Steven Cordwell's avatar
Steven Cordwell committed
69 70 71 72 73 74
v0.9 - the value iteration Gauss-Seidel algorithm is now in working order. The class ValueIterationGS should be stable and usable. Use like this:
	>>> import mdp
	>>> P, R = mdp.exampleRand(10, 3) # to create a random transition and reward matrices with 10 states and 3 actions
	>>> vigs = mdp.ValueIterationGS(P, R, 0.9 # assuming a discount rate of 0.9
	>>> vigs.iterate() # to run the algorithm, then type vigs.policy after it has finished to see the optimal policy

75 76 77 78 79
v0.8 - The policy iteration algorithm port is now usable. The PolicyIteration class should now be stable. To use:
	>>> import mdp
	>>> P, R = mdp.exampleForest() # to use the forest example as the transition and reward matrices
	>>> pi = mdp.PolicyIteration(P, R, 0.9)
	>>> pi.iterate # now typing pi.policy will show the optimal policy