Policies

Epsilon Greedy Policies

TabularReinforcementLearning.EpsilonGreedyPolicy — Type.

mutable struct EpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
    ϵ::Float64

Chooses the action with the highest value with probability 1 - ϵ and selects an action uniformly random with probability ϵ. For states with actions that where never performed before, the behavior of the VeryOptimisticEpsilonGreedyPolicy is followed.

source

TabularReinforcementLearning.OptimisticEpsilonGreedyPolicy — Type.

mutable struct OptimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
    ϵ::Float64

EpsilonGreedyPolicy that samples uniformly from the actions with the highest Q-value and novel actions in each state where actions are available that where never chosen before.

source

TabularReinforcementLearning.PesimisticEpsilonGreedyPolicy — Type.

mutable struct PesimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
    ϵ::Float64

EpsilonGreedyPolicy that does not handle novel actions differently.

source

TabularReinforcementLearning.VeryOptimisticEpsilonGreedyPolicy — Type.

mutable struct VeryOptimisticEpsilonGreedyPolicy <: AbstractEpsilonGreedyPolicy
    ϵ::Float64

EpsilonGreedyPolicy that samples uniformly from novel actions in each state where actions are available that where never chosen before. See also Initial values, novel actions and unseen values.

source

Softmax Policies

TabularReinforcementLearning.SoftmaxPolicy — Type.

mutable struct SoftmaxPolicy <: AbstractSoftmaxPolicy
    β::Float64

Choose action $a$ with probability

\[\frac{e^{\beta x_a}}{\sum_{a'} e^{\beta x_{a'}}}\]

where $x$ is a vector of values for each action. In states with actions that were never chosen before, a uniform random novel action is returned.

SoftmaxPolicy(; β = 1.)

Returns a SoftmaxPolicy with default β = 1.

source