Environments
To use other environment, please have a look at the API
Markov Decision Processes (MDPs)
TabularReinforcementLearning.MDP — Type.mutable struct MDP
ns::Int64
na::Int64
state::Int64
trans_probs::Array{AbstractArray, 2}
reward::Array{Float64, 2}
initialstates::Array{Int64, 1}
isterminal::Array{Int64, 1}A Markov Decision Process with ns states, na actions, current state, naxns - array of transition probabilites trans_props which consists for every (action, state) pair of a (potentially sparse) array that sums to 1 (see getprobvecrandom, getprobvecuniform, getprobvecdeterministic for helpers to constract the transition probabilities) naxns - array of reward, array of initial states initialstates, and ns - array of 0/1 indicating if a state is terminal.
TabularReinforcementLearning.MDP — Method.MDP(ns, na; init = "random")
MDP(; ns = 10, na = 4, init = "random")Return MDP with init in ("random", "uniform", "deterministic"), where the keyword init determines how to construct the transition probabilites (see also getprobvecrandom, getprobvecuniform, getprobvecdeterministic).
TabularReinforcementLearning.run! — Method.run!(mdp::MDP, policy::Array{Int64, 1}) = run!(mdp, policy[mdp.state])TabularReinforcementLearning.run! — Method.run!(mdp::MDP, action::Int64)Transition to a new state given action. Returns the new state.
setterminalstates!(mdp, range)Sets mdp.isterminal[range] .= 1, empties the table of transition probabilities for terminal states and sets the reward for all actions in the terminal state to the same value.
TabularReinforcementLearning.treeMDP — Method.treeMDP(na, depth; init = "random", branchingfactor = 3)Returns a tree structured MDP with na actions and depth of the tree. If init is random, the branchingfactor determines how many possible states a (action, state) pair has. If init = "deterministic" the branchingfactor = na.
getprobvecdeterministic(n, min = 1, max = n)Returns a SparseVector of length n where one element in min:max has value 1.
getprobvecrandom(n, min, max)Returns an array of length n that sums to 1 where all elements outside of min:max are zero.
getprobvecrandom(n)Returns an array of length n that sums to 1. More precisely, the array is a sample of a Dirichlet distribution with n categories and $lpha_1 = cdots =lpha_n = 1$.
getprobvecuniform(n) = fill(1/n, n)Solving MDPs
struct MDPLearner
gamma::Float64
policy::Array{Int64, 1}
values::Array{Float64, 1}
mdp::MDPUsed to solve mdp with discount factor gamma.
policy_iteration!(mdplearner::MDPLearner)Solve MDP with policy iteration using MDPLearner.