API

New learners, policies, callbacks, environments, evaluation metrics or stopping criteria need to implement the following functions.

Learners

Learners that require only a (state, action, reward) triple and possibly the next state and action should implement the first definition. If the learner is also to be used with a NstepLearner one also needs to implement the second definition.

TabularReinforcementLearning.update! — Function.

update!(learner::TabularReinforcementLearning.AbstractReinforcementLearner, 
        r, s0, a0, s1, a1, iss0terminal)

Update learner after observing state s0, performing action a0, receiving reward r, observing next state s1 and performing next action a1. The boolean iss0terminal is true if s0 is a terminal state.

update!(learner::Union{NstepLearner, EpisodicLearner}, 
        baselearner::TabularReinforcementLearning.AbstractReinforcementLearner, 
        rewards, states, actions, isterminal)

Update baselearner with arrays of maximally n+1 states, n+1 actions, n rewards, if learner is NstepLearner. If learner is EpisodicLearner the arrays grow until the end of an episode. The boolean isterminal is true if states[end-1] is a terminal state.

source

TabularReinforcementLearning.act — Method.

act(learner::TabularReinforcementLearning.AbstractReinforcementLearner,
    policy::TabularReinforcementLearning.AbstractPolicy,
    state)

Returns an action for a learner, using policy in state.

source

Policies

TabularReinforcementLearning.act — Method.

act(policy::TabularReinforcementLearning.AbstractPolicy, values)

Returns an action given an array of values (one value for each possible action) using policy.

source

TabularReinforcementLearning.getactionprobabilities — Function.

getactionprobabilities(policy::TabularReinforcementLearning.AbstractPolicy, values)

Returns a array of action probabilities for a given array of values (one value for each possible action) and policy.

source

Callbacks

TabularReinforcementLearning.callback! — Function.

callback!(callback::AbstractCallback, learner, policy, r, a, s, isterminal)

Can be used to manipulate the learner or the policy during learning, e.g. to change the learning rate or the exploration rate.

source

Environments

TabularReinforcementLearning.interact! — Function.

interact!(action, environment)

Updates the environment and returns the triple state, reward, isterminal, where state is the new state of the environment (an integer), reward is the reward obtained for the performed action and isterminal is true if the state is terminal.

source

TabularReinforcementLearning.getstate — Function.

getstate(environment)

Returns the tuple state, isterminal. See also interact!(action, environment).

source

TabularReinforcementLearning.reset! — Function.

reset!(environment)

Resets the environment to a possible initial state.

source

Evaluation Metrics

TabularReinforcementLearning.evaluate! — Function.

evaluate!(metric::TabularReinforcementLearning.AbstractEvaluationMetrics, 
          reward, action, state, isterminal)

Updates the metric based on the experienced (reward, action, state) triplet and the boolean isterminal that is true if state is terminal.

source

TabularReinforcementLearning.getvalue — Function.

getvalue(metric)

Returns the value of a metric.

source

Stopping Criteria

TabularReinforcementLearning.isbreak! — Function.

isbreak!(criterion::TabularReinforcementLearning.StoppingCriterion, r, a, s, isterminal)

Return true if criterion is matched. See ConstantNumberSteps and ConstantNumberEpisodes for builtin criterions and example for how to define new criterions.

source