Skip to content

AndreaTirinzoni/bandit-elimination

Repository files navigation

Elimination Strategies for Bandit Identification

Code for the paper "On Elimination Strategies for Bandit Fixed-Confidence Identification" by Andrea Tirinzoni and Remy Degenne. A preprint is available here.

The code is written in Julia. It is an extension of public code from existing papers:

  • LinGame by Degenne et al., 2020 [1]
  • FWS by Wang et al., 2021 [2]
  • RAGE by Fiez et al., 2019 [3]

Setting up Julia

Please make sure that you have installed a version of Julia later or equal to 1.6.4. The code requires the following packages.

JLD2
StatsPlots
LaTeXStrings
IterTools
Distributions
JuMP
Tulip

To install them, run "julia" in a terminal and type "import Pkg; Pkg.add("PACKAGE NAME")".

Code structure

The code is organized in the following files:

  • peps.jl: it implements the three pure exploration problems we consider (BAI, Top-m, and OSI)
  • stopping_rules.jl: it implements the LLR and elimination stopping rules
  • elimination_rules.jl: it implements selective and full elimination for the different pure exploration problems as described in Appendix B
  • sampling_rules.jl: it implements the sampling rules for all algorithms
  • envelope.jl: utils for FWS (extended from the code of Wang et al.)
  • regret.jl: no-regret learners for LinGame
  • runit.jl: functions to run an experiment
  • thresholds.jl: different thresholds for stopping and/or sampling rules
  • tracking.jl: tracking rules for LinGame, TaS, and FWS
  • experiment_helpers.jl: some functions to plot and visualize results
  • utils.jl: other general utilities

Implemented algorithms

The table below summarizes all the algorithms available in this repository. For each of them, we specify whether it works in unstructured and/or linear bandit instances, what pure exploration problems it can handle, whether it can be combined with our elimination rules at stopping and/or sampling, and whether it is natively elimination based. The oracle algorithm in the last row simply plays the fixed optimal proportions from the lower bound.

Name Unstructured Linear BAI Top-m OSI Elim. stopping Elim. sampling Native elim.
LinGame [1] X X X X X
FWS [2] X X X X X X X
RAGE [3] X X X
Lazy TaS [4] X X X X X X X
XY-Adaptive [5] X X X
m-LinGapE [6] X X X X
MisLid [7] X X X X
LinGIFA [6] X X X
k-Learner [8] X X X X X X
LUCB [9] X X X X X X
UGapE [10] X X X X
Racing [11] X X X X
Oracle X X X X X X X

Reproducing our experiments

There is one script for each experiment in the "experiments" folder which simply needs to be executed. The script will generate some .dat files with the results in the "results" sub-folder. Then, you can run the corresponding plot script in the "visualization" folder to visualize the results as in our paper. The scripts are:

We also include a general visualization script viz_experiment.jl which takes as input a .dat file (produced by the run of some algorithm) and plots the corresponding sample complexities, computation times, and elimination times.

Note: reproducing the experiments can take a long time. Most files run at least 5 algorithms, each with 3 to 5 variants and each for 100 runs, which makes roughly 2000 runs per file. To get faster results, just edit the script file to use less repetitions and/or to run only some desired algorithm. If one only cares about reproducing the plots, our results can be found in the experiments/saved_results folder.

References

[1] Degenne, R., Ménard, P., Shang, X., & Valko, M. (2020, November). Gamification of pure exploration for linear bandits. In International Conference on Machine Learning (pp. 2432-2442). PMLR.
[2] Wang, P. A., Tzeng, R. C., & Proutiere, A. (2021). Fast Pure Exploration via Frank-Wolfe. Advances in Neural Information Processing Systems, 34.
[3] Fiez, T., Jain, L., Jamieson, K. G., & Ratliff, L. (2019). Sequential experimental design for transductive linear bandits. Advances in neural information processing systems, 32.
[4] Jedra, Y., & Proutiere, A. (2020). Optimal best-arm identification in linear bandits. Advances in Neural Information Processing Systems, 33, 10007-10017.
[5] Soare, M., Lazaric, A., & Munos, R. (2014). Best-arm identification in linear bandits. Advances in Neural Information Processing Systems, 27.
[6] Réda, C., Kaufmann, E., & Delahaye-Duriez, A. (2021, March). Top-m identification for linear bandits. In International Conference on Artificial Intelligence and Statistics (pp. 1108-1116). PMLR.
[7] Réda, C., Tirinzoni, A., & Degenne, R. (2021). Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification. Advances in Neural Information Processing Systems, 34.
[8] Degenne, R., Koolen, W. M., & Ménard, P. (2019). Non-asymptotic pure exploration by solving games. Advances in Neural Information Processing Systems, 32.
[9] Kalyanakrishnan, S., Tewari, A., Auer, P., & Stone, P. (2012, June). PAC subset selection in stochastic multi-armed bandits. In ICML (Vol. 12, pp. 655-662).
[10] Gabillon, V., Ghavamzadeh, M., & Lazaric, A. (2012). Best arm identification: A unified approach to fixed budget and fixed confidence. Advances in Neural Information Processing Systems, 25.
[11] Kaufmann, E., & Kalyanakrishnan, S. (2013, June). Information complexity in bandit subset selection. In Conference on Learning Theory (pp. 228-251). PMLR.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages