Policies
Epsilon Greedy Policies
mutable struct EpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
ϵ::Float64
Chooses the action with the highest value with probability 1 - ϵ and selects an action uniformly random with probability ϵ. For states with actions that where never performed before, the behavior of the VeryOptimisticEpsilonGreedyPolicy
is followed.
mutable struct OptimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
ϵ::Float64
EpsilonGreedyPolicy
that samples uniformly from the actions with the highest Q-value and novel actions in each state where actions are available that where never chosen before.
mutable struct PesimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
ϵ::Float64
EpsilonGreedyPolicy
that does not handle novel actions differently.
mutable struct VeryOptimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
ϵ::Float64
EpsilonGreedyPolicy
that samples uniformly from novel actions in each state where actions are available that where never chosen before. See also Initial values, novel actions and unseen values.
Softmax Policies
mutable struct SoftmaxPolicy <: AbstractSoftmaxPolicy
β::Float64
Choose action $a$ with probability
where $x$ is a vector of values for each action. In states with actions that were never chosen before, a uniform random novel action is returned.
SoftmaxPolicy(; β = 1.)
Returns a SoftmaxPolicy with default β = 1.