The widely used method of Bayesian statistics is not as robust as commonly thought. Researcher Thijs van Ommen of Centrum Wiskunde & Informatica (CWI) discovered that for certain types of problems, Bayesian statistics finds non-existing patterns in the data. Van Ommen defends his thesis on this topic on Wednesday 10 June at Leiden University.
Bayesian statistics is most commonly used to determine whether an hypothesis is true or false given the evidence, and provides a measure of certainty with which this can be stated. It is widely used, most prominently in machine learning. Van Ommen discovered that Bayesian statistcs is not robust if the model assumptions are slightly wrong. He created several data sets on which Bayesian statistics reported non-existing patterns based on random noise in the data. The data sets all have realistic properties and could very well exist as real experimental data.
The errors occur in regression analysis of the data sets. Regression analysis is a very common form of data analysis in which a researcher is looking for a relation between two or more variables, one known and the other unknown. Whenever models are used that are not entirely correct, for instance in assuming that the noise has a specific random distribution, there is a risk that nonsensical conclusions are drawn. Van Ommen does not only report the problem, but also provides a solution to this problem in the form of an addition to Bayesian statistics. The addition, named SafeBayes, prevents the errors in regression analysis. It is expected to be added to statistical software like R or SPSS in the near future.
In his thesis, Van Ommen also has investigated the problem how a statistician might determine the probabilities of unknown outcomes in light of new evidence, when the precise relation between outcomes and evidence is unknown. A famous example of this problem is the Monty Hall puzzle, where the contestant has to guess which door hides a prize, based on evidence presented by the quizmaster. Van Ommen found that in some similar situations, the statistician's question has a single objective answer, while in other situations it doesn't. He also provides some techniques for finding the optimal strategy for such puzzles.
This research was funded by the Vici grant of Peter Grünwald, awarded by the Netherlands Organisation for Scientific Research (NWO) in 2010.
PhD Defense Thijs van Ommen