Bandits and Game Tree Search in AI

Data are often costly to collect, and so it is important to collect the data that are most useful to the question at hand. This is especially true in problems where data acquisition is under active control of the learning system itself. By being smart about which data to obtain, the learning system can maximize performance and minimize cost. For example, in an adaptive clinical trial we might desire to minimize the number of patients (cost) required to reach a given statistical significance (performance).

Beyond medical testing, adaptive data acquisition problems arise in many on-line and artificial intelligence (AI) domains, including recommendation, advertising and website optimization. They also form a core problem in simulator-based planning, and in time-constrained search for winning moves in two-player board games, so called game-tree search. Current algorithms for adaptive data acquisition problems (sometimes called Bandit problems) are successfully applied in many of these domains, including the AlphaGo program that succesfully defeated the world's best Go player.

Yet existing approaches are optimal only for unstructured problems. The Machine Learning group at CWI studies the design of learning systems for answering structured questions. We investigate how the structure of the question determines the complexity of the learning problem, and how it can inform the design of optimal learning strategies. We are developing theory and algorithms, working towards automating the pathway from question to algorithm. We have already developed new methods that are guaranteed to find the optimal move in stochastic two-player games with high confidence using very few evaluations.

Contact person: Wouter Koolen
Research group: Machine Learning (ML)
Partners: INRIA Lille, IMT Toulouse