Simple usage
Define an
Agent
.Choose an environment.
Choose a metric.
Choose a stopping criterion.
(Optionally) define an
RLSetup
.Learn with
learn!
.Look at results with
getvalue
.
Example
agent = Agent(QLearning())
env = MDP()
metric = TotalReward()
stop = ConstantNumberSteps(100)
x = RLSetup(agent, env, metric, stop)
learn!(x)
getvalue(metric)
Advanced Usage
Define an
Agent
by choosing one of the learners, one of the policies and one of the callbacks (e.g. to have an exploration schedule).Choose an environment or define the interaction with a custom environment.
( - 7.) as above.
(Optionally) compare with optimal solution.
Example
learner = QLearning(na = 5, ns = 500, λ = .8, γ = .95,
tracekind = ReplacingTraces, initvalue = 10.)
policy = EpsilonGreedyPolicy(.2)
callback = ReduceEpsilonPerT(10^4)
agent = Agent(learner, policy, callback)
env = MDP(na = 5, ns = 500, init = "deterministic")
metric = EvaluationPerT(10^4)
stop = ConstantNumberSteps(10^6)
x = RLSetup(agent, env, metric, stop)
@time learn!(x)
res = getvalue(metric)
mdpl = MDPLearner(env, .95)
policy_iteration!(mdpl)
reset!(env)
x2 = RLSetup(Agent(mdpl, EpsilonGreedyPolicy(.2), ReduceEpsilonPerT(10^4)),
env, EvaluationPerT(10^4), ConstantNumberSteps(10^6))
run!(x2)
res2 = getvalue(x2.metric)
Comparisons
See section Comparison
.
Examples
See examples.