Commit a55e65e8 authored by Steven Cordwell's avatar Steven Cordwell
Browse files

clean up ValueIteration

parent d8733c36
......@@ -1146,13 +1146,13 @@ class ValueIteration(MDP):
discounted MDP. The algorithm consists of solving Bellman's equation
iteratively.
Iteration is stopped when an epsilon-optimal policy is found or after a
specified number (max_iter) of iterations.
specified number (``max_iter``) of iterations.
This function uses verbose and silent modes. In verbose mode, the function
displays the variation of V (the value function) for each iteration and the
condition which stopped the iteration: epsilon-policy found or maximum
displays the variation of ``V`` (the value function) for each iteration and
the condition which stopped the iteration: epsilon-policy found or maximum
number of iterations reached.
Let S = number of states, A = number of actions.
Let ``S`` = number of states, ``A`` = number of actions.
Parameters
----------
......@@ -1294,12 +1294,9 @@ class ValueIteration(MDP):
if initial_value == 0:
self.V = zeros(self.S)
else:
assert len(initial_value) == self.S, "PyMDPtoolbox: The " \
"initial value must be a vector of length S."
try:
self.V = initial_value.reshape(self.S)
except AttributeError:
self.V = array(initial_value).reshape(self.S)
assert len(initial_value) == self.S, "The initial value must be " \
"a vector of length S."
self.V = array(initial_value).reshape(self.S)
if self.discount < 1:
# compute a bound for the number of iterations and update the
# stored value of self.max_iter
......@@ -1346,7 +1343,6 @@ class ValueIteration(MDP):
PP[aa] = self.P[aa][:, ss]
except ValueError:
PP[aa] = self.P[aa][:, ss].todense().A1
# the method "min()" without any arguments finds the
# minimum of the entire array.
h[ss] = PP.min()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment