Supervisor: Wouter M. Koolen
Background: Pure Exploration is an active area of Machine Learning and Statistics. Its central problem of Best Arm Identification has been studied intensively since (Even-Dar, Mannor, and Mansour, 2002), and worst-case optimal methods have been developed for the fixed confidence, fixed budget and simple regret settings (Bubeck, Munos, and Stoltz, 2011). After a long and respectable series of papers establishing worst-case optimality, a revolutionary new approach called Track-and-Stop was pioneered by Garivier and Kaufmann (2016) that delivers instance-optimal methods. Since then, several aspects of Track-and-Stop have been generalised and refined: tighter stopping thresholds were constructed by Kaufmann and Koolen (2021), computational efficiency was improved using saddle-point methods by Degenne, Koolen, and M´enard (2019), and problems with multiple answers were analysed by Degenne and Koolen (2019). A recent extension with subpopulations was proposed by Russac et al. (2021).
Academic Content: The typical pure exploration task is identification of the best arm from samples. Letting μk denote the mean of arm k, the task is to identify arg maxk μk. The sample complexity for this problem is well studied. One may reduce the sample complexity— at the cost of incurring some approximation error ϵ — by asking for identification of any ϵ-best arm k ∈ {k | μk ≥ maxk μk − ϵ}. The starting point of this project is the realisation that for many applications it is enough to identify an ϵ-optimal mixture policy. That is, we are happy with any probability distribution p