API

API

New learners, policies, callbacks, environments, evaluation metrics or stopping criteria need to implement the following functions.

Learners

Learners that require only a (state, action, reward) triple and possibly the next state and action should implement the first definition. If the learner is also to be used with a NstepLearner one also needs to implement the second definition.

update!(learner::TabularReinforcementLearning.AbstractReinforcementLearner, 
        r, s0, a0, s1, a1, iss0terminal)

Update learner after observing state s0, performing action a0, receiving reward r, observing next state s1 and performing next action a1. The boolean iss0terminal is true if s0 is a terminal state.

update!(learner::Union{NstepLearner, EpisodicLearner}, 
        baselearner::TabularReinforcementLearning.AbstractReinforcementLearner, 
        rewards, states, actions, isterminal)

Update baselearner with arrays of maximally n+1 states, n+1 actions, n rewards, if learner is NstepLearner. If learner is EpisodicLearner the arrays grow until the end of an episode. The boolean isterminal is true if states[end-1] is a terminal state.

source
act(learner::TabularReinforcementLearning.AbstractReinforcementLearner,
    policy::TabularReinforcementLearning.AbstractPolicy,
    state)

Returns an action for a learner, using policy in state.

source

Policies

act(policy::TabularReinforcementLearning.AbstractPolicy, values)

Returns an action given an array of values (one value for each possible action) using policy.

source
getactionprobabilities(policy::TabularReinforcementLearning.AbstractPolicy, values)

Returns a array of action probabilities for a given array of values (one value for each possible action) and policy.

source

Callbacks

callback!(callback::AbstractCallback, learner, policy, r, a, s, isterminal)

Can be used to manipulate the learner or the policy during learning, e.g. to change the learning rate or the exploration rate.

source

Environments

interact!(action, environment)

Updates the environment and returns the triple state, reward, isterminal, where state is the new state of the environment (an integer), reward is the reward obtained for the performed action and isterminal is true if the state is terminal.

source
getstate(environment)

Returns the tuple state, isterminal. See also interact!(action, environment).

source
reset!(environment)

Resets the environment to a possible initial state.

source

Evaluation Metrics

evaluate!(metric::TabularReinforcementLearning.AbstractEvaluationMetrics, 
          reward, action, state, isterminal)

Updates the metric based on the experienced (reward, action, state) triplet and the boolean isterminal that is true if state is terminal.

source
getvalue(metric)

Returns the value of a metric.

source

Stopping Criteria

isbreak!(criterion::TabularReinforcementLearning.StoppingCriterion, r, a, s, isterminal)

Return true if criterion is matched. See ConstantNumberSteps and ConstantNumberEpisodes for builtin criterions and example for how to define new criterions.

source