Information-Theoretic Learning

Project name: Information-Theoretic Learning (INS4.2)

Research group: Algorithms and Complexity (INS4) 

Coordinator of this project: Peter Grunwald
Startdate: 1997-01-01
Enddate: 2100-01-01

INS 4.2 addresses a broad range of issues related to machine learning, both theoretically and practically, and mostly from an information-theoretic perspective. In particular, we study the relation between data compression, generalization properties and prediction in the sense of the "minimum description length" paradigm, basically a formal version of Occam's Razor. The project coordinator is the author of "The Minimum Description Length Principle", the first comprehensive overview of this area.

Recent work includes new methods for dealing with MDL/Bayesian inference when all models are wrong. In particular, we identified a novel practical model selection and prediction method that allows for substantial improvement of both cross-validation/AIC-type methods on the one hand, and Bayesian model selection/averaging methods on the other hand. In other recent work, it is shown that both (standard) MDL and Bayes can behave quite badly if the model is only slightly wrong.

We also study the Kolmogorov structure function and its relation to the `idealized' versions of MDL based on Kolmogorov complexity. Relatedly, we have recently developed an individual-sequence version of rate-distortion theory. This fits in the broader picture of an algorithmic rather than probabilistic approach to statistics and learning.

More practically oriented research includes similarity analysis by data compression, which can be used to determine a compression-based "distance" between any two types of objects, be they, for example, DNA sequences, literary texts or midi files. A variation of this idea has led to the Normalized Google Distance, which uses the world-wide web to determine the similarity between objects.

Members
Harry Buhrman, Lukasz Jerzy Debowski, Tim van Erven, Peter Grunwald, Peter Harremoës, Wouter Koolen-Wijkstra, Alfonso Martinez, Paul Vitanyi.

Key publications
Publications of Information-Theoretic Learning

Reports of Information-Theoretic Learning

Cooperation partners
We cooperate nationally with Leiden University and Eurandom (R. Gill), and internationally with the University of Helsinki (J. Rissanen, P. Myllymaki, T. Roos), Cornell Univ. (J. Halpern), University College London (A.P. Dawid), Univ. Warwick (N. Chater), Moscow University Helsinki Un. (H. Tirri, P. Kontkanen), Moscow State University (N. Vereshchagin). The group is a partner in the 6th framework/7th framework European Community Network of Excellence.