Commit ad93b411 authored by Steven Cordwell's avatar Steven Cordwell

[example] Add small example

Add a small MDP example that has only two states and two actions. This
is mostly useful to help with testing the algorithms, and as an example
of the usage of them.
parent 033b7755
......@@ -340,3 +340,29 @@ def rand(S, A, is_sparse=False, mask=None):
_np.ones(S, dtype=int)))
# we want to return the generated transition and reward matrices
return(P, R)
def small():
"""A very small Markov decision process.
The probability transition matrices are::
| | 0.5 0.5 | |
| | 0.8 0.2 | |
P = | |
| | 0.0 1.0 | |
| | 0.1 0.9 | |
The reward matrix is::
R = | 5 10 |
| -1 2 |
Returns
=======
out : tuple
``out[0]`` is a numpy array of the probability transition matriices.
``out[1]`` is a numpy arrray of the reward matrix.
"""
P = _np.array([[[0.5, 0.5],[0.8, 0.2]],[[0, 1],[0.1, 0.9]]])
R = _np.array([[5, 10], [-1, 2]])
return(P, R)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment