Workshop on Theory of Control and Reinforcement Learning

As a part of our semester programme, we organise a workshop on “Theory of Control and Reinforcement Learning” on June 19-20, 2025 at CWI, Amsterdam.

Share this page

When

19 Jun 2025 from 9 a.m. to 20 Jun 2025 6 p.m. CEST (GMT+0200)

Where

Science Park 125, Turingzaal

Add

Add event to calendar

Cancellations made before 6 June will receive a full refund; cancellations made after this date are non-refundable.

We invite contributions for talks (up to June 10th) and/or posters from researchers in the theory of control and RL, especially ones bridging them. The organisers will select talks based on topic and available slots. For contributed talks, we will confirm the speakers by June 12th, 2025. Attendance without presenting is welcome as well.

Payment link

Andreas Krause, ETH Zurich, Switzerland

Andreas Krause is a Professor of Computer Science at ETH Zurich, where he leads the Learning & Adaptive Systems Group. He also serves as Academic Co-Director of the Swiss Data Science Center and Chair of the ETH AI Center, and co-founded the ETH spin-off LatticeFlow. Before that he was an Assistant Professor of Computer Science at Caltech. His research focuses on learning and adaptive systems that actively acquire information, reason and reliably make decisions in complex and uncertain domains. His works advances principles of online, active and reinforcement learning, probabilistic and generative modeling for optimization and control, and engages them in real-world applications. He is a Max Planck Fellow at the Max Planck Institute for Intelligent Systems, ACM Fellow, IEEE Fellow, ELLIS Fellow, a Microsoft Research Faculty Fellow and a Kavli Frontiers Fellow of the US National Academy of Sciences. He received the Rössler Prize, ERC Starting and ERC Consolidator grants, the German Pattern Recognition Award, an NSF CAREER award as well as the ETH Golden Owl teaching award. His research on machine learning and adaptive systems has received awards at several premier conferences and journals, including the ACM SIGKDD Test of Time award 2019 and the ICML Test of Time award 2020. From 2023-24, he served on the United Nations’ High-level Advisory Body on AI.

Talk details

Safe and Efficient Exploration in Model-Based Reinforcement Learning

How can we enable systems to efficiently and safely learn online, from interaction with the real world? I will first discuss safe Bayesian optimization, where we quantify uncertainty in the unknown reward function and constraints, and, under some regularity conditions, can guarantee both safety and convergence to a natural notion of reachable optimum. I will then consider Bayesian model-based deep reinforcement learning, where we use the epistemic uncertainty in the dynamics model to guide exploration while ensuring safety. Lastly I will discuss how we can meta-learn flexible data-driven priors from related tasks and simulations, and discuss several applications in robotics and science.

Anuradha Annaswamy, MIT, USA

Anuradha Annaswamy is the Founder and Director of the Active-Adaptive Control Laboratory in the Department of Mechanical Engineering at MIT. She previously held faculty positions at Yale and Boston University, and is currently a Senior Research Scientist at MIT. Her research interests span adaptive control theory and its applications to aerospace, automotive, propulsion, and energy systems as well as cyber-physical systems such as Smart Grids, Smart Cities, and Smart Infrastructures. She has received best paper awards (Axelby, 1986; CSM, 2010; IFAC Annual Reviews in Control, 2021-23), Distinguished Member and Distinguished Lecturer awards from the IEEE Control Systems Society (CSS), and a Presidential Young Investigator award from NSF, 1991-97. She is a Fellow of IEEE and the International Federation of Automatic Control and recipient of the Distinguished Alumni Award from Indian Institute of Science for 2021. She received the IEEE Control Systems Technology Award from CSS in 2024.

Anu Annaswamy is the author of a graduate textbook on adaptive control, co-editor of two vision documents on smart grids and two editions of the Impact of Control Technology report, and editor of IEEE Open Journal of Control Systems, the IFAC Annual Reviews in Control, and Asian Journal of Control. She has co-authored two National Academies consensus reports: The Future of Electric Power in the United States (2021) and The Role of Net Metering in the Evolving Electricity System (2023). She served as the President of CSS in 2020. Currently, Dr. Annaswamy serves as President-elect of the American Automatic Control Council and as Editor in Chief of the IEEE Control Systems magazine. Dr. Annaswamy received her Ph.D. in Electrical Engineering from Yale University in 1985.

Talk details

Adaptive Control and Intersections with Reinforcement Learning

Adaptive control and reinforcement learning based control are two different methods that are both commonly employed for the control of uncertain systems. Historically, adaptive control has excelled at real-time control of systems with specific model structures through adaptive rules that learn the underlying parameters while providing strict guarantees on stability, asymptotic performance, and learning. Reinforcement learning methods are applicable to a broad class of systems and are able to produce near-optimal policies for highly complex control tasks. This is often enabled by significant offline training via simulation or the collection of large input-state datasets. In both methods, the main approach used for updating the parameters is based on a parametrized policy and gradient descent-like algorithms. Related tools of analysis, convergence, and robustness in both fields have a tremendous amount of similarity as well.

This talk will examine the similarities and interconnections between adaptive control and reinforcement learning-based control. Concepts in stability, performance, and learning, common to both methods will be discussed. Building on the similarities in update laws and common concepts, new intersections and opportunities for improved algorithm analysis will be explored. Two specific examples of dynamic systems are used to illustrate the details of the two methods, their advantages, and their deficiencies. We will explore how these methods can be leveraged and integrated to lead to provably correct methods for learning in real-time with guaranteed fast convergence. Examples will be drawn from a range of engineering applications.

Emilie Kaufmann, Université de Lille, CNRS- CRIStAL, Inria, France

Emilie Kaufmann is a CNRS researcher in the CRIStAL laboratory at Université de Lille. She is also a member of the Inria team Scool. She is interested in statistics and machine learning, with a particular focus on sequential learning. She has studied variants of the Multi-Armed Bandit (MAB) and Markov Decision Processes (MDPs) under both reinforcement learning ("maximize rewards while learning") and adaptive testing ("learn as fast as you can by adaptively collecting data") formulations. On the application side, her recent interest is in the potential use of bandit strategies for adaptive early stage clinical trials, and in the use of contextual bandits for precision medicine. She won the 2014 Jacques Neveu Prize for the best PhD thesis in mathematics and statistics in France, and CNRS bronze medal in 2024.

Talk details

From regret to PAC RL

In online reinforcement learning, regret is perhaps the most studied performance metric in the literature on theoretical RL. In this talk we will consider episodic MDP and study the dual PAC RL framework, in which the goal is to identify near-optimal policies with high confidence, relaxing the need to maximize rewards while learning. In particular, we will be interested in the reward-free exploration problem, in which the goal is to learn a good policy with respect to any reward function that is given after the exploration phase. We will introduce different algorithms that can be viewed as variant of the UCB-VI algorithm (whose regret has been well studied) incorporating some appropriate (intrinsic) rewards to foster exploration. In particular, we will present the first algorithm with a sample complexity bound that does not depend only on the size of the MDP, going beyond minimax guarantees.

Gergely Neu, Universitat Pompeu Fabra, Spain

Gergely Neu is a Research Assistant Professor at the AI group of Universitat Pompeu Fabra. He is a machine learning researcher mainly interested in theoretical aspects of sequential decision making. He mainly works on online optimization, bandit problems, and reinforcement learning theory. He likes to think about algorithms that come with performance guarantees both in terms of computational and statistical complexity, and are actually implementable on a computer. He has received an ERC Starting Grant in 2020, the first Bosch AI Young Researcher Award in 2019, and a Google Faculty Research Award in 2019.

Talk details

Distances for Markov chains from sample streams

Bisimulation metrics are powerful tools for measuring similarities between stochastic processes, and specifically Markov chains. Recent advances have uncovered that bisimulation metrics are, in fact, optimal-transport distances, which has enabled the development of fast algorithms for computing such metrics with provable accuracy and runtime guarantees. However, these recent methods, as well as all previously known methods, assume full knowledge of the transition dynamics. This is often an impractical assumption in most real-world scenarios, where typically only sample trajectories are available. In this work, we propose a stochastic optimization method that addresses this limitation and estimates bisimulation metrics based on sample
access, without requiring explicit transition models. Our approach is derived from a new linear programming (LP) formulation of bisimulation metrics, which we solve using a stochastic primal-dual optimization method. We provide theoretical guarantees on the sample complexity of the algorithm and validate its effectiveness through a series of empirical evaluations.

Jan Peters, Technische Universität Darmstadt, DFKI, Germany

Jan Peters is a full professor (W3) for Intelligent Autonomous Systems at the Computer Science Department of the Technische Universitaet Darmstadt since 2011, and, at the same time, he is the dept head of the research department on Systems AI for Robot Learning (SAIROL) at the German Research Center for Artificial Intelligence (Deutsches Forschungszentrum für Künstliche Intelligenz, DFKI) since 2022. He is also is a founding research faculty member of the Hessian Center for Artificial Intelligence. Jan Peters has received the Dick Volz Best 2007 US PhD Thesis Runner-Up Award, the Robotics: Science & Systems - Early Career Spotlight, the INNS Young Investigator Award, and the IEEE Robotics & Automation Society's Early Career Award as well as numerous best paper awards. In 2015, he received an ERC Starting Grant and in 2019, he was appointed IEEE Fellow, in 2020 ELLIS fellow and in 2021 AAIA fellow.

Despite being a faculty member at TU Darmstadt only since 2011, Jan Peters has already nurtured a series of outstanding young researchers into successful careers. These include two dozen new faculty members at leading universities in the USA, Japan, Germany, Canada, Finland, Vietnamn and Holland, postdoctoral scholars at top computer science departments (including MIT, CMU, and Berkeley) and young leaders at top AI companies (including Amazon, Boston Dynamics, Google and Facebook/Meta). Two of his graduates have received the Best European Robotics PhD Thesis Award and three further graduates have been runner up for this award.

Jan Peters has studied Computer Science, Electrical, Mechanical and Control Engineering at TU Munich and FernUni Hagen in Germany, at the National University of Singapore (NUS) and the University of Southern California (USC) in Los Angeles. He has received four Master's degrees in these disciplines as well as a Computer Science PhD from USC. Jan Peters has performed research in Germany at DLR, TU Munich and the Max Planck Institute for Biological Cybernetics (in addition to the institutions above), in Japan at the Advanced Telecommunication Research Center (ATR), at USC and at both NUS and Siemens Advanced Engineering in Singapore. He has led research groups on Machine Learning for Robotics at the Max Planck Institutes for Biological Cybernetics (2007-2010) and Intelligent Systems (2010-2021).

Talk details

Inductive Biases for Robot Reinforcement Learning

The quest for intelligent robots capable of learning complex behaviors from limited data hinges critically on the design and integration of inductive biases—structured assumptions that guide learning and generalization. In this talk, Jan Peters explores the foundational role of inductive biases in robot learning, drawing from insights in control theory, neuroscience, and machine learning. He discusses how exploiting physical principles, modular control structures, symmetry, temporal abstraction, and domain-specific priors can drastically reduce sample complexity and improve robustness in robotic systems.

Through a series of concrete examples—including robot table tennis, tactile manipulation, quadruped locomotion, and dynamic motor skill learning on anthropomorphic arms—Peters illustrates how inductive biases enable efficient policy search, reinforcement learning, and imitation learning. These applications demonstrate how embedding prior knowledge about motor primitives, control hierarchies, or contact dynamics helps robots acquire versatile skills with minimal data. The talk concludes with a vision for future robot learning systems that integrate such structured biases with modern data-driven methods, enabling scalable, adaptive, and generalizable autonomy in real-world environments.

Michael Muehlebach, Max Planck Institute for Intelligent Systems, Tuebingen, Germany

Michael Muehlebach is leading the independent research group Learning and Dynamical Systems at the Max Planck Institute for Intelligent Systems in Tuebingen, Germany. Michael did his PhD from Institute for Dynamic Systems and Control at ETH Zurich and postdoc from University of California, Berkeley. He is interested in a wide variety of subjects, including machine learning, dynamics, control, and optimization. During my Ph.D. He worked on approximations of the constrained linear quadratic regulator problem with applications to model predictive control. He also analyzed first-order optimization algorithms from a dynamical system's point of view. He is a Branco Weiss Fellow since 2018, was awarded the Emmy Noether Fellowship in 2020, and an Amazon Fellowship in 2024.

Talk details

On the Sample-Complexity of Online Reinforcement Learning: Packing, Priors, and Pontryagin

The talk studies the sample complexity of online reinforcement learning in the general setting of nonlinear dynamical systems with continuous state and action spaces. The analysis accommodates a large class of dynamical systems ranging from a finite set of nonlinear candidate models to models with bounded and Lipschitz continuous dynamics, to systems that are parametrized by a compact and real-valued set of parameters. We derive policy regret guarantees for each scenario and recover earlier results that were exclusively derived for linear time-invariant dynamical systems. The last part of the talk adopts a broader point of view and discusses the relation between feedforward and feedback controllers. We conclude by highlighting applications to real-world robotic systems including autonomous racing, magnetic navigation systems, and a table tennis playing robot that is actuated by pneumatic artificial muscles.

Peter Grünwald, CWI Amsterdam, Netherlands

Peter Grünwald is senior researcher in the machine learning group at CWI in Amsterdam, the Netherlands. He is also full professor of Statistical Learning at the mathematical institute of Leiden University. Peter is the recipient of a prestigious ERC Advanced Grant (2024) (CWI announcement and interviews (in Dutch) in national newspapers De Volkskrant and Trouw– see here for more general biographical information and recent media exposure). The project, and Peter’s research in general, is about creating a much more flexible theory of statistical inference, based on the emerging theory of e-values and e-processes. E-values (wikipedia) are an alternative to p-values that effortlessly deal with optional continuation: with e-value based tests and the corresponding always valid confidence intervals, one can always gather additional data, while keeping statistically valid conclusions. From 2018-2022, Peter served as the President of the Association for Computational Learning, the organization running COLT, the world’s prime annual conference on machine learning theory. He is editor of Foundation and Trends in Machine learning, and author of the book "The Minimum Description Length Principle" (MIT Press, 2007). He is co-recipient of the Van Dantzig prize in 2010, the highest Dutch award in statistics and operations research.

Roxana Rădulescu, Utrecht University, Netherlands

Roxana Rădulescu is currently an Assistant Professor in AI and Data Science, at the Department of Information and Computing Sciences, at Utrecht University. Before this, she was a FWO Postdoctoral fellow at the Artificial Intelligence Lab, Vrije Universiteit Brussel, Belgium. Her research is focussed on the development of multi-agent decision making systems where each agent is driven by different objectives and goals, under the paradigm of multi-objective multi-agent reinforcement learning.

Talk details

Multi-objective learning agents

Most real-world problems involve multiple, potentially conflicting objectives; for example, safety versus fuel efficiency versus speed in autonomous driving, or treatment effectiveness versus side effects in medical treatment planning. Tackling such problems using reinforcement learning (RL) methods either requires an a-priori scalarisation of the reward signal, or involves applying multi-objective RL. In this talk I compare these approaches, and take a deeper dive into multi-objective RL. The goal is to highlight practical considerations, theoretical results, and additional challenges and benefits, as well as to delineate how and when it is appropriate to use multi-objective RL.

Tim van Erven, University of Amsterdam, Netherlands

Tim van Erven is an associate professor at the Korteweg-de Vries Institute for Mathematics at the University of Amsterdam in the Netherlands. His research explores the mathematical foundations of machine learning. His research group designs mathematically well-founded machine learning methods for online convex optimization that work well out of the box, without any manual fine-tuning. In 2023, he has joined the board of directors of COLT association. He has been awarded the VICI grant by the Dutch Research Councilin 2019 and 2025.

Talk details

An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction

I will present an efficient algorithm for linear contextual bandits with adversarial losses and stochastic action sets. Our approach reduces this setting to misspecification-robust adversarial linear bandits with fixed action sets. Without knowledge of the context distribution or access to a context simulator, the algorithm achieves tilde{O}(d^2 sqrt{T}) regret and runs in poly(d,C,T) time, where d is the feature dimension, C is the number of linear constraints defining the action set in each round, and T is number of rounds. This resolves the open question by Liu et al. (2023) on whether one can obtain poly(d)sqrt{T} regret in polynomial time independent of the number of actions. For the important class of combinatorial bandits with adversarial losses and stochastic action sets, our algorithm is the first to achieve poly(d)\sqrt{T} regret in polynomial time, while no prior algorithm achieves even o(T) regret in polynomial time to our knowledge. When a simulator is available, the regret bound can be improved to tilde{O}(d\sqrt{L*}), where L* is the cumulative loss of the best policy.

This is joint work with Jack Mayo, Julia Olkhovskaya and Chen-Yu Wei.

Thursday 19 June 2025

09:00 - 09:30 Registration and tea/coffee

09:30 - 09:45 Welcome speech by Peter Grünwald (video)

09:45 - 10:45 Emilie Kaufmann, From regret to PAC RL (video)

10:45 - 11:15 break

11:15 - 12:15 Roxana Rădulescu, Multi-objective learning agents (video)

12:15 - 13:30 Lunch break

13:30 - 15:00 Contributed talks
Thomas Michel, DP-SPRT: Differentially Private Sequential Probability Ratio Tests
Lukas Zierahn, Best Arm Identification for Shifting Means with Uniform Sampling
Udvas Das, FraPPE: Fast and Efficient Preference-based Pure Exploration
Stavros Orfanoudakis, Physics-Informed Reinforcement Learning for Real-Time Sequential Decision-Making Under Uncertainty
David Leeftink, Probabilistic Pontryagin's Principle for Model-Based Reinforcement Learning in Continuous-Time
Herke van Hoof, Solving compositional reinforcement learning problems with a learned policy basis

15:00 - 15:30 break

15:30 - 16:30 Gergely Neu, Distances for Markov chains from sample streams (video)

16:30 - 17:30 Andreas Krause, Safe and Efficient Exploration in Model-Based Reinforcement Learning (video)

17:30 - 19:00 Posters and discussion with drinks/bites

Friday 20 June 2025

09:00 - 09:30 Registration and tea/coffee

09:30 - 10:30 Anuradha Annaswamy, Adaptive Control and Intersections with Reinforcement Learning (video)

10:30 - 11:00 break

11:00 - 12:00 Tim van Erven, An improved algorithm for adversarial linear contextual bandits via reduction (video)

12:00 - 13:30 Lunch and Poster session

13:30 - 14:30 Michael Muehlebach, On the sample-complexity of online reinforcement learning: packing, priors, and Pontryagin (video)

14:30 - 15:30 Jan Peters, Inductive Biases for Robot Reinforcement Learning

15:30 - 16:00 break

16:00 - 17:00 Panel discussion

17:00 - 17:30 Posters and discussion with drinks/bites

Venue

The conference will be held in the Turing room at the Congress Centre of Amsterdam Science Park, next to Centrum Wiskunde & Informatica (CWI).
Address: Science Park 125, 1098 XG Amsterdam.

Travel / accommodation

Please be aware that hotel prices in Amsterdam can be quite steep. We strongly recommend all participants to secure their hotel reservations as early as possible!

Hotel Recommendations
* Generator Hostel
* MEININGER Hotel (Amsterdam Amstel)
* Hotel Casa
* The Manor Amsterdam
* The Lancaster Hotel Amsterdam

From these hotels, the venue can be reached in 15-30 minutes with public transport. In all public transportation, you can check in and out with a Mastercard or Visa contactless credit card and also with Apple Pay and Google Wallet.

Sharing a hotel room is a great way to reduce costs! If you are attending and are interested in room-sharing arrangements, please send an email to events@cwi.nl .

Group photo CWI Semester Programme group

Workshop on Theory of Control and Reinforcement Learning

Speakers information

Programme

Logistics

Workshop on Theory of Control and Reinforcement Learning

Speakers information

Programme

Logistics

Cookies