Environments

Environments

To use other environment, please have a look at the API

Markov Decision Processes (MDPs)

mutable struct MDP 
    ns::Int64
    na::Int64
    state::Int64
    trans_probs::Array{AbstractArray, 2}
    reward::Array{Float64, 2}
    initialstates::Array{Int64, 1}
    isterminal::Array{Int64, 1}

A Markov Decision Process with ns states, na actions, current state, naxns - array of transition probabilites trans_props which consists for every (action, state) pair of a (potentially sparse) array that sums to 1 (see getprobvecrandom, getprobvecuniform, getprobvecdeterministic for helpers to constract the transition probabilities) naxns - array of reward, array of initial states initialstates, and ns - array of 0/1 indicating if a state is terminal.

source
MDP(ns, na; init = "random")
MDP(; ns = 10, na = 4, init = "random")

Return MDP with init in ("random", "uniform", "deterministic"), where the keyword init determines how to construct the transition probabilites (see also getprobvecrandom, getprobvecuniform, getprobvecdeterministic).

source
run!(mdp::MDP, policy::Array{Int64, 1}) = run!(mdp, policy[mdp.state])
source
run!(mdp::MDP, action::Int64)

Transition to a new state given action. Returns the new state.

source
setterminalstates!(mdp, range)

Sets mdp.isterminal[range] .= 1, empties the table of transition probabilities for terminal states and sets the reward for all actions in the terminal state to the same value.

source
treeMDP(na, depth; init = "random", branchingfactor = 3)

Returns a tree structured MDP with na actions and depth of the tree. If init is random, the branchingfactor determines how many possible states a (action, state) pair has. If init = "deterministic" the branchingfactor = na.

source
getprobvecdeterministic(n, min = 1, max = n)

Returns a SparseVector of length n where one element in min:max has value 1.

source
getprobvecrandom(n, min, max)

Returns an array of length n that sums to 1 where all elements outside of min:max are zero.

source
getprobvecrandom(n)

Returns an array of length n that sums to 1. More precisely, the array is a sample of a Dirichlet distribution with n categories and $lpha_1 = cdots =lpha_n = 1$.

source
getprobvecuniform(n)  = fill(1/n, n)
source

Solving MDPs

struct MDPLearner
    gamma::Float64
    policy::Array{Int64, 1}
    values::Array{Float64, 1}
    mdp::MDP

Used to solve mdp with discount factor gamma.

source
policy_iteration!(mdplearner::MDPLearner)

Solve MDP with policy iteration using MDPLearner.

source