Usage · Tabular Reinforcement Learning

Simple usage

Define an Agent.
Choose an environment.
Choose a metric.
Choose a stopping criterion.
(Optionally) define an RLSetup.
Learn with learn!.
Look at results with getvalue.

Example

agent = Agent(QLearning())
env = MDP()
metric = TotalReward()
stop = ConstantNumberSteps(100)
x = RLSetup(agent, env, metric, stop)
learn!(x)
getvalue(metric)

Advanced Usage

Define an Agent by choosing one of the learners, one of the policies and one of the callbacks (e.g. to have an exploration schedule).
Choose an environment or define the interaction with a custom environment.
( - 7.) as above.
(Optionally) compare with optimal solution.

Example

learner = QLearning(na = 5, ns = 500, λ = .8, γ = .95,
                    tracekind = ReplacingTraces, initvalue = 10.)
policy = EpsilonGreedyPolicy(.2)
callback = ReduceEpsilonPerT(10^4)
agent = Agent(learner, policy, callback)
env = MDP(na = 5, ns = 500, init = "deterministic")
metric = EvaluationPerT(10^4)
stop = ConstantNumberSteps(10^6)
x = RLSetup(agent, env, metric, stop)
@time learn!(x)
res = getvalue(metric)
mdpl = MDPLearner(env, .95)
policy_iteration!(mdpl)
reset!(env)
x2 = RLSetup(Agent(mdpl, EpsilonGreedyPolicy(.2), ReduceEpsilonPerT(10^4)), 
             env, EvaluationPerT(10^4), ConstantNumberSteps(10^6))
run!(x2)
res2 = getvalue(x2.metric)

Comparisons

See section Comparison.

Examples

See examples.