Recently, CWI and others were awarded an EU Marie Skłodowska-Curie Innovative Training Networks (ITN) consortium grant for ALPACA –'Algorithms for PAngenome Computational Analysis'. The research project will run for four years, and it involves a total funding of 3.67 million euros. The network will train 14 PhD researchers, including one at CWI. The project started in January 2020.
Genomes are strings over the letters A, C, G, T, which represent nucleotides, the building blocks of DNA. In view of ultra-large amounts of genome sequence data emerging from ever more and technologically rapidly advancing genome sequencing devices—in the meantime, amounts of sequencing data accrued are reaching into the exabyte scale—the driving, urgent question is: how can we arrange and analyze these data masses in a formally rigorous, computationally efficient and biomedically rewarding manner? Graph based data structures have been pointed out to have disruptive benefits over traditional sequence based structures when representing pan-genomes, sufficiently large, evolutionarily coherent collections of genomes. This project will put this shift of paradigms—from sequence to graph based representations of genomes—into full effect.
The project consortium consists of 23 academic and industrial partners, including CWI, the University of Bielefeld (Germany), the CNRS (France), the INRIA (France), the University of Pisa (Italy), the University of Milan-Bicocca (Italy), the Heinrich Heine University in Düsseldorf (Germany), the European Molecular Biology Laboratory (EMBl-EBI), the Comenius University in Bratislava (Slovakia), the University of Helsinki (Finland), the Pasteur Institute (France), and the University of Cambridge (UK).