Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Zahra Rajabi
pymdptoolbox
Commits
a55e65e8
Commit
a55e65e8
authored
Mar 10, 2014
by
Steven Cordwell
Browse files
clean up ValueIteration
parent
d8733c36
Changes
1
Hide whitespace changes
Inline
Side-by-side
src/mdptoolbox/mdp.py
View file @
a55e65e8
...
...
@@ -1146,13 +1146,13 @@ class ValueIteration(MDP):
discounted MDP. The algorithm consists of solving Bellman's equation
iteratively.
Iteration is stopped when an epsilon-optimal policy is found or after a
specified number (max_iter) of iterations.
specified number (
``
max_iter
``
) of iterations.
This function uses verbose and silent modes. In verbose mode, the function
displays the variation of
V
(the value function) for each iteration and
the
condition which stopped the iteration: epsilon-policy found or maximum
displays the variation of
``V``
(the value function) for each iteration and
the
condition which stopped the iteration: epsilon-policy found or maximum
number of iterations reached.
Let
S
= number of states,
A
= number of actions.
Let
``S``
= number of states,
``A``
= number of actions.
Parameters
----------
...
...
@@ -1294,12 +1294,9 @@ class ValueIteration(MDP):
if
initial_value
==
0
:
self
.
V
=
zeros
(
self
.
S
)
else
:
assert
len
(
initial_value
)
==
self
.
S
,
"PyMDPtoolbox: The "
\
"initial value must be a vector of length S."
try
:
self
.
V
=
initial_value
.
reshape
(
self
.
S
)
except
AttributeError
:
self
.
V
=
array
(
initial_value
).
reshape
(
self
.
S
)
assert
len
(
initial_value
)
==
self
.
S
,
"The initial value must be "
\
"a vector of length S."
self
.
V
=
array
(
initial_value
).
reshape
(
self
.
S
)
if
self
.
discount
<
1
:
# compute a bound for the number of iterations and update the
# stored value of self.max_iter
...
...
@@ -1346,7 +1343,6 @@ class ValueIteration(MDP):
PP
[
aa
]
=
self
.
P
[
aa
][:,
ss
]
except
ValueError
:
PP
[
aa
]
=
self
.
P
[
aa
][:,
ss
].
todense
().
A1
# the method "min()" without any arguments finds the
# minimum of the entire array.
h
[
ss
]
=
PP
.
min
()
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment