A Julia package for simulating the behavior of new algorithms for solving multi-armed bandit problems, including the stochastic and contextual problems.
ContextMinimalContextStochasticContext
BanditAdversarialBanditDuelingBanditsProbabilisticBanditContextualBanditNonStationaryBanditChangePointBanditsMarkovianBanditRestlessBanditsSleepingBandits
StochasticBandit
LearnerMLELearnerBetaLearner
AlgorithmRandomChoiceEpsilonGreedyAnnealingEpsilonGreedyDecreasingEpsilonGreedySoftmaxAnnealingSoftmaxUCB1UCB1TunedUCB2UCBVExp3ThompsonSamplingHedgeMOSSReinforcementComparisonPursuit
GameStochasticGame
TO BE FILLED IN...
Here we simulate several algorithms for T = 5 trials. To get accurate
estimates of the default summary statistics generated by the simulate
function, we use 50_000 simulation runs per algorithm/bandit pair.
using Bandits, Distributions
T = 5
S = 50_000
l1 = MLELearner(0.5, 0.25)
l2 = BetaLearner()
algorithms = [
RandomChoice(l1),
EpsilonGreedy(l1, 0.1),
Softmax(l1, 0.1),
UCB1(l1),
ThompsonSampling(l2),
MOSS(l1),
]
bandits = [
StochasticBandit([Bernoulli(0.1), Bernoulli(0.2), Bernoulli(0.3)]),
]
simulate(algorithms, bandits, T, S)We use the following abbreviations throughout our codebase:
- s: Current simulation
- S: Total number of simulations
- t: Current trial
- T: Total number of trials
- c: Context
- a: Index of an arm
- r: Reward
- g: Regret