Commit 1d86ce7e authored by Steven Cordwell's avatar Steven Cordwell
Browse files

[metadata] Remove history and todo files

The history can be viewed by looking at git log, and the todo text file
wasn't being used and instead use the GitHub issue tracker.
parent a624d0dd
v4.0b2 - Second beta release, new tic tac toe example which needs testing for
correctness. Documentation is mostly complete.
v4.0b1 - fist beta release created on the testing branch
v4.0a4 - Package is useable with Python 3.3, used 2tp3 script to convert. Still
needs a bit of cleaning up.
v4.0a3 - Converted the module into a package format, and renamed it so now it
must be imported with:
>>> import mdptoolbox
v4.0a2 - Fixes a bug that didn't allow sparse matrices to be used in the
ValueIteration class.
v4.0alpha1 - the documentation has started to take shape, and the modules
docstring is now in a stable condition. The module *should* be useful
for everyday use, but testing is incomplete, and some documentation is
still missing.
v0.14 - the linear programming code has been finalised, however there is currently no way to turn of the progress status of the
lp solver from the cvxopt module. One unit test has been written. To use linear programming method:
>>> import mdp
>>> P, R = mdp.exampleForest()
>>> lp = mdp.LP(P, R, 0.9)
>>> lp.iterate() # and after some output from cvxopt.solvers.lp the optimal policy can be shown with lp.policy
v0.13 - the FiniteHorizon class has been fixed. No unit tests yet.
>>> import mdp
>>> P, R = mdp.exampleForest()
>>> pim = mdp.FiniteHorizon(P, R, 10) # 10 time-step horizon
>>> pim.iterate() # get the optimal policy with pim.policy
v0.12 - PolicyIterationModified has been completed and is feature complete. Unit tests have not yet been written, and the docs need fixing.
>>> import mdp
>>> P, R = mdp.exampleForest()
>>> pim = mdp.PolicyIterationModified(P, R, 0.9)
>>> pim.iterate()
>>> pim.policy
(0, 0, 0)
v0.11 - QLearning is now ready for use in the module, with a couple of unit tests already written. The class is used as follows:
>>> import mdp
>>> import numpy as np
>>> P = np.array([[[0.5, 0.5],[0.8, 0.2]],[[0, 1],[0.1, 0.9]]])
>>> R = R = np.array([[5, 10], [-1, 2]])
>>> ql = mdp.QLearning(P, R, 0.9)
>>> ql.iterate()
>>> ql.policy
(1, 0)
v0.10 - the RelativeValueIteration class has been completed and fulfils the requirements to bump up the version number.
>>> import mdp
>>> P, R = mdp.exampleForest()
>>> rvi = mdp.RelativeValueIteration(P, R) # this algorithm does not use discounting
>>> rvi.iterate() # runs the algorithm
>>> rvi.policy # to get the optimal policy
(0, 0, 0)
Pre 2013-01-25
v0.9 - the value iteration Gauss-Seidel algorithm is now in working order. The class ValueIterationGS should be stable and usable. Use like this:
>>> import mdp
>>> P, R = mdp.exampleRand(10, 3) # to create a random transition and reward matrices with 10 states and 3 actions
>>> vigs = mdp.ValueIterationGS(P, R, 0.9 # assuming a discount rate of 0.9
>>> vigs.iterate() # to run the algorithm, then type vigs.policy after it has finished to see the optimal policy
v0.8 - The policy iteration algorithm port is now usable. The PolicyIteration class should now be stable. To use:
>>> import mdp
>>> P, R = mdp.exampleForest() # to use the forest example as the transition and reward matrices
>>> pi = mdp.PolicyIteration(P, R, 0.9)
>>> pi.iterate # now typing pi.policy will show the optimal policy
1. Improve the documentation and rewrite it as neccessary.
2. Implement a nicer linnear programming interface to cvxopt, or write own
linear programming code.
3. Write unittests for all the classes.
4. Implement own exception class.
5. Move evalPolicy* functions to be functions of the util module, as these are
useful for checking policies in cases other than policy iteration algorithms.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment