Policies
Epsilon Greedy Policies
mutable struct EpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
ϵ::Float64Chooses the action with the highest value with probability 1 - ϵ and selects an action uniformly random with probability ϵ. For states with actions that where never performed before, the behavior of the VeryOptimisticEpsilonGreedyPolicy is followed.
mutable struct OptimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
ϵ::Float64EpsilonGreedyPolicy that samples uniformly from the actions with the highest Q-value and novel actions in each state where actions are available that where never chosen before.
mutable struct PesimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
ϵ::Float64EpsilonGreedyPolicy that does not handle novel actions differently.
mutable struct VeryOptimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
ϵ::Float64EpsilonGreedyPolicy that samples uniformly from novel actions in each state where actions are available that where never chosen before. See also Initial values, novel actions and unseen values.
Softmax Policies
mutable struct SoftmaxPolicy <: AbstractSoftmaxPolicy
β::Float64Choose action $a$ with probability
where $x$ is a vector of values for each action. In states with actions that were never chosen before, a uniform random novel action is returned.
SoftmaxPolicy(; β = 1.)Returns a SoftmaxPolicy with default β = 1.