Genomic Data Science

Recently, genomics have been witnessing a data revolution of a kind that was hard to predict before. Novel sequencing techonologies ("Next-Generation Sequencing") allow to sequence all of one's DNA in short time and at little cost ("The DNA Data Deluge"). We tackle these Genomic Data Science challenges by algorithms and statistics that take into account that the data is both big and uncertain ("Big Uncertain Data").

Recently, genomics have been witnessing a data revolution of a kind that was hard to predict before. Novel sequencing techonologies ("next-generation sequencing") allow to sequence all of one's DNA in short time and at little cost ("The DNA Data Deluge"). In the meantime, the DNA that has been sequenced is soon to reach the exa(!)byte mark. The leverage for (personalized/stratified) health research arising from these data masses are enormous, as they, for the first time, allow to gain comprehensive understanding of the fundamental code of life. For example, huge, global-scale cancer genomics studies are under way that hold the promise for individualized, efficient cancer therapies; so are plenty of now much enhanced genome-wide association studies which try to directly link changes in the code with disease risks. At the same time, harnessing these gigantic heaps of sequence fragments poses intriguing research questions in areas of research surrounding modern genomics, such as
computer science, mathematics and statistics. 

The Life Sciences group invests in tackling those Data Science challenges by developing novel algorithms and statistics that, in combination, allow to master both the 'bigness' of the data as well as the uncertainties inherent to the data. The novel methodologies developed have significant impact in terms of applications in molecular biology, genetics and medicine.

A prominent example of our recent successes is our participation in the Genome of the Netherlands project, which has been concerned with analyzing 769 Dutch individuals amounting to 100 terabytes of data. Algorithms developed in our group have enabled to detect genetic variants that had been notoriously hard to detect before. The result of our efforts is a catalog of systematically arranged genetic variants, prevalent in the Dutch population, which is invaluable in genetics and personalized medicine studies.

Contact person: Alexander Schönhuth
Research group: Life Sciences (LS)
Research partners: University Utrecht, University Medical Center Utrecht, University Medical Center Groningen, Radboud University Nijmegen