Policies

Policies

Epsilon Greedy Policies

mutable struct EpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
    ϵ::Float64

Chooses the action with the highest value with probability 1 - ϵ and selects an action uniformly random with probability ϵ. For states with actions that where never performed before, the behavior of the VeryOptimisticEpsilonGreedyPolicy is followed.

source
mutable struct OptimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
    ϵ::Float64

EpsilonGreedyPolicy that samples uniformly from the actions with the highest Q-value and novel actions in each state where actions are available that where never chosen before.

source
mutable struct PesimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
    ϵ::Float64

EpsilonGreedyPolicy that does not handle novel actions differently.

source
mutable struct VeryOptimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
    ϵ::Float64

EpsilonGreedyPolicy that samples uniformly from novel actions in each state where actions are available that where never chosen before. See also Initial values, novel actions and unseen values.

source

Softmax Policies

mutable struct SoftmaxPolicy <: AbstractSoftmaxPolicy
    β::Float64

Choose action $a$ with probability

\[\frac{e^{\beta x_a}}{\sum_{a'} e^{\beta x_{a'}}}\]

where $x$ is a vector of values for each action. In states with actions that were never chosen before, a uniform random novel action is returned.

SoftmaxPolicy(; β = 1.)

Returns a SoftmaxPolicy with default β = 1.

source