Environments
To use other environment, please have a look at the API
Markov Decision Processes (MDPs)
TabularReinforcementLearning.MDP
— Type.mutable struct MDP
ns::Int64
na::Int64
state::Int64
trans_probs::Array{AbstractArray, 2}
reward::Array{Float64, 2}
initialstates::Array{Int64, 1}
isterminal::Array{Int64, 1}
A Markov Decision Process with ns
states, na
actions, current state
, na
xns
- array of transition probabilites trans_props
which consists for every (action, state) pair of a (potentially sparse) array that sums to 1 (see getprobvecrandom
, getprobvecuniform
, getprobvecdeterministic
for helpers to constract the transition probabilities) na
xns
- array of reward
, array of initial states initialstates
, and ns
- array of 0/1 indicating if a state is terminal.
TabularReinforcementLearning.MDP
— Method.MDP(ns, na; init = "random")
MDP(; ns = 10, na = 4, init = "random")
Return MDP with init in ("random", "uniform", "deterministic")
, where the keyword init determines how to construct the transition probabilites (see also getprobvecrandom
, getprobvecuniform
, getprobvecdeterministic
).
TabularReinforcementLearning.run!
— Method.run!(mdp::MDP, policy::Array{Int64, 1}) = run!(mdp, policy[mdp.state])
TabularReinforcementLearning.run!
— Method.run!(mdp::MDP, action::Int64)
Transition to a new state given action
. Returns the new state.
setterminalstates!(mdp, range)
Sets mdp.isterminal[range] .= 1
, empties the table of transition probabilities for terminal states and sets the reward for all actions in the terminal state to the same value.
TabularReinforcementLearning.treeMDP
— Method.treeMDP(na, depth; init = "random", branchingfactor = 3)
Returns a tree structured MDP with na actions and depth
of the tree. If init
is random, the branchingfactor
determines how many possible states a (action, state) pair has. If init = "deterministic"
the branchingfactor = na
.
getprobvecdeterministic(n, min = 1, max = n)
Returns a SparseVector
of length n
where one element in min
:max
has value 1.
getprobvecrandom(n, min, max)
Returns an array of length n
that sums to 1 where all elements outside of min
:max
are zero.
getprobvecrandom(n)
Returns an array of length n
that sums to 1. More precisely, the array is a sample of a Dirichlet distribution with n
categories and $lpha_1 = cdots =lpha_n = 1$.
getprobvecuniform(n) = fill(1/n, n)
Solving MDPs
struct MDPLearner
gamma::Float64
policy::Array{Int64, 1}
values::Array{Float64, 1}
mdp::MDP
Used to solve mdp
with discount factor gamma
.
policy_iteration!(mdplearner::MDPLearner)
Solve MDP with policy iteration using MDPLearner
.