Speakers A/B testing workshop

Christina Katsimerou (Booking.com) has a PhD in Computer Science. She is currently a Principal Machine Learning Scientist at Booking.com focusing on adaptive experimentation and causality. 

Building a subpopulation-aware A/B experimentation platform at Booking

Abstract: E-commerce firms use online controlled experiments, or A/B/n testing, for evidence based decision making, learning, and inspiration. The most straightforward way to conduct an A/B/n experiment is distribute the traffic uniformly among the default product (control) and the alternative versions (variants); the variants that appear to be significantly better than the control at the (fixed in advance) stopping time of the experiment are considered for deployment. While well-established, this process can be inefficient in terms of resources and prone to human error. At Booking, we are building a new experimentation platform, relying on sequential testing policies that can both adjust the allocation of the samples and be stopped adaptively. The platform needs to be able to learn under data heterogeneities that are present in real world traffic, such as daily or weekly seasonality. In the talk, I will describe the subpopulation-aware mathematical model and provide results from an A/B/n experiment that show the improvements in sample efficiency compared to a learner that assumes sample homogeneity. 


Peter Grünwald (CWI & Leiden University) heads the machine learning group at CWI in Amsterdam, the Netherlands. He is also full professor of Statistical Learning at the mathematical institute of Leiden University.  Currently the President of the Association for Computational Learning, the organization running COLT, the world’s prime annual conference on machine learning theory, he was co-program chair of COLT in 2015 and also chaired UAI – another top ML conference – in 2010/2011. He is the author of the book The Minimum Description Length Principle, (MIT Press, 2007). In 2010 he was co-awarded the Van Dantzig prize, the highest Dutch award in statistics and operations research. He received NWO VIDI (2005), VICI (2010) and TOP-1 (2016) grants.

Rosanne Turner (CWI & UMC Utrecht) is a PhD student on the subjects of statistics and machine learning with Professor Peter Grünwald and Professor Floortje Scheepers at CWI and University Medical Center Utrecht. In 2018 she already obtained a PhD in the field of Medicine, and in 2019 she completed a Statistics Master at Leiden University. In her current projects, she works on utilizing new concepts from statistics and machine learning theory to make personalized recommendations for patients.

Anytime-valid testing and confidence intervals in contingency tables and beyond

Abstract: We develop sequential A/B tests with strict Type-I error control under optional stopping. Our tests are based on E-variables, which recently have turned out be successful tools for anytime-valid inference. We introduce a general method for constructing E-variables that can be used for A/B testing in 2-sample streams. In contrast to earlier methods developed in the sequential testing literature, our approach is valid for both balanced and unbalanced experiments and allows for arbitrary, user-specified notions of effect size. The same method can be used to design anytime-valid confidence sequences to estimate effect sizes in data streams. With two Bernoulli streams as a running example, we illustrate the power of our A/B test and show that decisions can often be made earlier compared to classical methods, such as Fisher's exact test. We also illustrate the confidence sequences with two different notions of effect size: log odds ratio and difference in mean.

Wouter Koolen (CWI) is a Senior Researcher with the Machine Learning Group at CWI, where he studies interactive learning and decision making using statistics and game-theory. Koolen was a postdoc at QUT Brisbane (2013-2015), visiting scholar at UC Berkeley (2013-2015), and postdoc at Royal Holloway (2011-2013) after obtaining his PhD from the University of Amsterdam (2011). Koolen coauthored over 40 publications at top machine learning venues including COLT and Neurips. He received an NWO-VENI grant 2015-2018.

Instance-optimal algorithms for A/B testing

Abstract: In A/B testing we use controlled experiments to support decision making. But how should the decision at hand influence our testing procedure? And could a more flexible testing methodology possibly support more creative decisions? To understand the interplay, we will precisely quantify the complexity of testing problem instances, and investigate their ideal resolution. To this end we will discuss the notions of characteristic time and oracle allocations. Building on these concepts we will then describe a path to practical instance optimal algorithms for A/B testing.


Alan Malek (Deepmind, formerly Optimizely) currently a Research Scientist at DeepMind, works somewhere in the intersection of causal inference and sequential decision making (e.g. bandits and online learning). Recent interests include bandit settings with causal structure and deploying online learning algorithms to solve selection problems in causal inference. Before joining DeepMind, Alan was the Statistician at Optimizely, a San Francisco-based startup that provided a sequential hypothesis testing platform: he worked on sequential hypothesis testing methodology and explaining it to customers . Academically, Alan holds a B.S. in Mathematics from Stanford University, a M.A. in Statistics and PhD in Computer Science from UC Berkeley. He was also a Postdoc at MIT for a little while. 

Causality in A/B testing