Environments

To use other environment, please have a look at the API

Markov Decision Processes (MDPs)

TabularReinforcementLearning.MDP — Type.

mutable struct MDP 
    ns::Int64
    na::Int64
    state::Int64
    trans_probs::Array{AbstractArray, 2}
    reward::Array{Float64, 2}
    initialstates::Array{Int64, 1}
    isterminal::Array{Int64, 1}

A Markov Decision Process with ns states, na actions, current state, naxns - array of transition probabilites trans_props which consists for every (action, state) pair of a (potentially sparse) array that sums to 1 (see getprobvecrandom, getprobvecuniform, getprobvecdeterministic for helpers to constract the transition probabilities) naxns - array of reward, array of initial states initialstates, and ns - array of 0/1 indicating if a state is terminal.

source

TabularReinforcementLearning.MDP — Method.

MDP(ns, na; init = "random")
MDP(; ns = 10, na = 4, init = "random")

Return MDP with init in ("random", "uniform", "deterministic"), where the keyword init determines how to construct the transition probabilites (see also getprobvecrandom, getprobvecuniform, getprobvecdeterministic).

source

TabularReinforcementLearning.run! — Method.

run!(mdp::MDP, policy::Array{Int64, 1}) = run!(mdp, policy[mdp.state])

source

TabularReinforcementLearning.run! — Method.

run!(mdp::MDP, action::Int64)

Transition to a new state given action. Returns the new state.

source

TabularReinforcementLearning.setterminalstates! — Method.

setterminalstates!(mdp, range)

Sets mdp.isterminal[range] .= 1, empties the table of transition probabilities for terminal states and sets the reward for all actions in the terminal state to the same value.

source

TabularReinforcementLearning.treeMDP — Method.

treeMDP(na, depth; init = "random", branchingfactor = 3)

Returns a tree structured MDP with na actions and depth of the tree. If init is random, the branchingfactor determines how many possible states a (action, state) pair has. If init = "deterministic" the branchingfactor = na.

source

TabularReinforcementLearning.getprobvecdeterministic — Function.

getprobvecdeterministic(n, min = 1, max = n)

Returns a SparseVector of length n where one element in min:max has value 1.

source

TabularReinforcementLearning.getprobvecrandom — Method.

getprobvecrandom(n, min, max)

Returns an array of length n that sums to 1 where all elements outside of min:max are zero.

source

TabularReinforcementLearning.getprobvecrandom — Method.

getprobvecrandom(n)

Returns an array of length n that sums to 1. More precisely, the array is a sample of a Dirichlet distribution with n categories and $lpha_1 = cdots =lpha_n = 1$.

source

TabularReinforcementLearning.getprobvecuniform — Method.

getprobvecuniform(n)  = fill(1/n, n)

source

Solving MDPs

TabularReinforcementLearning.MDPLearner — Type.

struct MDPLearner
    gamma::Float64
    policy::Array{Int64, 1}
    values::Array{Float64, 1}
    mdp::MDP

Used to solve mdp with discount factor gamma.

source

TabularReinforcementLearning.policy_iteration! — Method.

policy_iteration!(mdplearner::MDPLearner)

Solve MDP with policy iteration using MDPLearner.

source