Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Zahra Rajabi
pymdptoolbox
Commits
df7f9e55
Commit
df7f9e55
authored
Mar 11, 2014
by
Steven Cordwell
Browse files
fix some typos in docstrings
parent
118a2fa8
Changes
1
Hide whitespace changes
Inline
Side-by-side
src/mdptoolbox/mdp.py
View file @
df7f9e55
...
@@ -70,46 +70,50 @@ class MDP(object):
...
@@ -70,46 +70,50 @@ class MDP(object):
"""A Markov Decision Problem.
"""A Markov Decision Problem.
Let
S
= the number of states, and
A
= the number of acions.
Let
``S``
= the number of states, and
``A``
= the number of acions.
Parameters
Parameters
----------
----------
transitions : array
transitions : array
Transition probability matrices. These can be defined in a variety of
Transition probability matrices. These can be defined in a variety of
ways. The simplest is a numpy array that has the shape (A, S, S),
ways. The simplest is a numpy array that has the shape
``
(A, S, S)
``
,
though there are other possibilities. It can be a tuple or list or
though there are other possibilities. It can be a tuple or list or
numpy object array of length A, where each element contains a numpy
numpy object array of length ``A``, where each element contains a numpy
array or matrix that has the shape (S, S). This "list of matrices" form
array or matrix that has the shape ``(S, S)``. This "list of matrices"
is useful when the transition matrices are sparse as
form is useful when the transition matrices are sparse as
scipy.sparse.csr_matrix matrices can be used. In summary, each action's
``scipy.sparse.csr_matrix`` matrices can be used. In summary, each
transition matrix must be indexable like ``P[a]`` where
action's transition matrix must be indexable like ``transitions[a]``
``a`` ∈ {0, 1...A-1}.
where ``a`` ∈ {0, 1...A-1}, and ``transitions[a]`` returns an ``S`` ×
``S`` array-like object.
reward : array
reward : array
Reward matrices or vectors. Like the transition matrices, these can
Reward matrices or vectors. Like the transition matrices, these can
also be defined in a variety of ways. Again the simplest is a numpy
also be defined in a variety of ways. Again the simplest is a numpy
array that has the shape (S, A), (S,) or (A, S, S). A list of lists can
array that has the shape ``(S, A)``, ``(S,)`` or ``(A, S, S)``. A list
be used, where each inner list has length S. A list of numpy arrays is
of lists can be used, where each inner list has length ``S`` and the
possible where each inner array can be of the shape (S,), (S, 1),
outer list has length ``A``. A list of numpy arrays is possible where
(1, S) or (S, S). Also scipy.sparse.csr_matrix can be used instead of
each inner array can be of the shape ``(S,)``, ``(S, 1)``, ``(1, S)``
numpy arrays. In addition, the outer list can be replaced with a tuple
or ``(S, S)``. Also ``scipy.sparse.csr_matrix`` can be used instead of
or numpy object array can be used.
numpy arrays. In addition, the outer list can be replaced by any object
that can be indexed like ``reward[a]`` such as a tuple or numpy object
array of length ``A``.
discount : float
discount : float
Discount factor. The per time-step discount factor on future rewards.
Discount factor. The per time-step discount factor on future rewards.
Valid values are greater than 0 upto and including 1. If the discount
Valid values are greater than 0 upto and including 1. If the discount
factor is 1, then convergence is cannot be assumed and a warning will
factor is 1, then convergence is cannot be assumed and a warning will
be displayed. Subclasses of ``MDP`` may pass None in the case where
the
be displayed. Subclasses of ``MDP`` may pass
``
None
``
in the case where
algorithm does not use a discount factor.
the
algorithm does not use a discount factor.
epsilon : float
epsilon : float
Stopping criterion. The maximum change in the value function at each
Stopping criterion. The maximum change in the value function at each
iteration is compared against ``epsilon``. Once the change falls below
iteration is compared against ``epsilon``. Once the change falls below
this value, then the value function is considered to have converged to
this value, then the value function is considered to have converged to
the optimal value function. Subclasses of ``MDP`` may pass None in the
the optimal value function. Subclasses of ``MDP`` may pass ``None`` in
case where the algorithm does not use a stopping criterion.
the case where the algorithm does not use an epsilon-optimal stopping
criterion.
max_iter : int
max_iter : int
Maximum number of iterations. The algorithm will be terminated once
Maximum number of iterations. The algorithm will be terminated once
this many iterations have elapsed. This must be greater than 0 if
this many iterations have elapsed. This must be greater than 0 if
specified. Subclasses of ``MDP`` may pass None in the case where
the
specified. Subclasses of ``MDP`` may pass
``
None
``
in the case where
algorithm does not use a maximum number of iterations.
the
algorithm does not use a maximum number of iterations.
Attributes
Attributes
----------
----------
...
@@ -130,12 +134,12 @@ class MDP(object):
...
@@ -130,12 +134,12 @@ class MDP(object):
time : float
time : float
The time used to converge to the optimal policy.
The time used to converge to the optimal policy.
verbose : boolean
verbose : boolean
Whether verbose output should be displayed
in
not.
Whether verbose output should be displayed
or
not.
Methods
Methods
-------
-------
run
run
Implemented in child classes as the main algorithm loop. Raises an
d
Implemented in child classes as the main algorithm loop. Raises an
exception if it has not been overridden.
exception if it has not been overridden.
setSilent
setSilent
Turn the verbosity off
Turn the verbosity off
...
@@ -314,11 +318,11 @@ class FiniteHorizon(MDP):
...
@@ -314,11 +318,11 @@ class FiniteHorizon(MDP):
---------------
---------------
V : array
V : array
Optimal value function. Shape = (S, N+1). ``V[:, n]`` = optimal value
Optimal value function. Shape = (S, N+1). ``V[:, n]`` = optimal value
function at stage ``n`` with stage in
(
0, 1...N-1
)
. ``V[:, N]`` value
function at stage ``n`` with stage in
{
0, 1...N-1
}
. ``V[:, N]`` value
function for terminal stage.
function for terminal stage.
policy : array
policy : array
Optimal policy. ``policy[:, n]`` = optimal policy at stage ``n`` with
Optimal policy. ``policy[:, n]`` = optimal policy at stage ``n`` with
stage in
(
0, 1...N
)
. ``policy[:, N]`` = policy for stage ``N``.
stage in
{
0, 1...N
}
. ``policy[:, N]`` = policy for stage ``N``.
time : float
time : float
used CPU time
used CPU time
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment