API
New learners, policies, callbacks, environments, evaluation metrics or stopping criteria need to implement the following functions.
Learners
Learners that require only a (state, action, reward) triple and possibly the next state and action should implement the first definition. If the learner is also to be used with a NstepLearner one also needs to implement the second definition.
TabularReinforcementLearning.update!
— Function.update!(learner::TabularReinforcementLearning.AbstractReinforcementLearner,
r, s0, a0, s1, a1, iss0terminal)
Update learner
after observing state s0
, performing action a0
, receiving reward r
, observing next state s1
and performing next action a1
. The boolean iss0terminal
is true
if s0
is a terminal state.
update!(learner::Union{NstepLearner, EpisodicLearner},
baselearner::TabularReinforcementLearning.AbstractReinforcementLearner,
rewards, states, actions, isterminal)
Update baselearner
with arrays of maximally n+1
states
, n+1
actions
, n
rewards, if learner
is NstepLearner
. If learner
is EpisodicLearner
the arrays grow until the end of an episode. The boolean isterminal
is true
if states[end-1]
is a terminal state.
TabularReinforcementLearning.act
— Method.act(learner::TabularReinforcementLearning.AbstractReinforcementLearner,
policy::TabularReinforcementLearning.AbstractPolicy,
state)
Returns an action for a learner
, using policy
in state
.
Policies
TabularReinforcementLearning.act
— Method.act(policy::TabularReinforcementLearning.AbstractPolicy, values)
Returns an action given an array of values
(one value for each possible action) using policy
.
getactionprobabilities(policy::TabularReinforcementLearning.AbstractPolicy, values)
Returns a array of action probabilities for a given array of values
(one value for each possible action) and policy
.
Callbacks
TabularReinforcementLearning.callback!
— Function.callback!(callback::AbstractCallback, learner, policy, r, a, s, isterminal)
Can be used to manipulate the learner or the policy during learning, e.g. to change the learning rate or the exploration rate.
Environments
TabularReinforcementLearning.interact!
— Function.interact!(action, environment)
Updates the environment
and returns the triple state
, reward
, isterminal
, where state
is the new state of the environment (an integer), reward
is the reward obtained for the performed action
and isterminal
is true
if the state
is terminal.
TabularReinforcementLearning.getstate
— Function.getstate(environment)
Returns the tuple state
, isterminal
. See also interact!(action, environment)
.
TabularReinforcementLearning.reset!
— Function.reset!(environment)
Resets the environment
to a possible initial state.
Evaluation Metrics
TabularReinforcementLearning.evaluate!
— Function.evaluate!(metric::TabularReinforcementLearning.AbstractEvaluationMetrics,
reward, action, state, isterminal)
Updates the metric
based on the experienced (reward
, action
, state
) triplet and the boolean isterminal
that is true
if state
is terminal.
TabularReinforcementLearning.getvalue
— Function.getvalue(metric)
Returns the value of a metric.
Stopping Criteria
TabularReinforcementLearning.isbreak!
— Function.isbreak!(criterion::TabularReinforcementLearning.StoppingCriterion, r, a, s, isterminal)
Return true
if criterion
is matched. See ConstantNumberSteps
and ConstantNumberEpisodes
for builtin criterions and example for how to define new criterions.