Seminar++ meetings consist of a one-hour lecture building up to an open problem, followed by an hour of brainstorming time. The meeting is intended for interested researchers including PhD students. These meetings are freely accessible without registration. Cookies and tea will be provided in the half-time break.
Online reinforcement learning with linear function approximation: role of the choice of policy optimization algorithm and learner’s feedback
Abstract: We consider learning in an adversarial MDP, where the loss function can change arbitrarily between episodes, and we assume that the Q-function of any policy is linear in some known features. We discuss two recent works, providing new insights into the solution to this problem (, ). We will look at the combination of methods proposed in these two papers to achieve better theoretical guarantees on the performance of the algorithms. More precisely, we will check if taking the best from both papers can lead to an improvement: exploration bonuses from  and the choice of the regularizer from . If there will be time, we also discuss the variation of this problem when the information available to the learner is only the cumulative loss of the learner accumulated over the episode.