A puzzle with a million pieces: assembling viral genomes from sequencing data

Researchers from CWI’s Life Science and Health group have developed a new computational tool, SAVAGE, for reconstructing the genomes of the different virus strains that affect an infected person. SAVAGE makes it possible to reconstruct the different strains – of which there can be plenty in an infected person – even when so called reference genomes are not available

Publication date: 27-06-2017

Researchers from CWI’s Life Science and Health group have developed a new computational tool, SAVAGE, for reconstructing the genomes of the different virus strains that affect an infected person. SAVAGE makes it possible to reconstruct the different strains – of which there can be plenty in an infected person – even when so called reference genomes are not available. Due to high mutation rates and high genetic diversity of viruses, high-quality reference genomes are often not available at the time of a new disease outbreak. Determining which strains are present in an infection, is the start of determining a therapy protocol.

Next generation sequencing and overlap graphs

The researchers were able to revive so called overlap graph based techniques, which had been deemed impossible in modern, “next-generation sequencing” based settings because of the huge amounts of data involved in the analysis. By following the overlap graph paradigm they developed a method for assembling polyploid genomes. The idea to use overlap graphs was crucial, because only this allows, finally, to distinguish technical errors from strain-specific sequence mutations. The method outperforms all relevant state-of-the-art approaches, often quite drastically, with respect to the quality of the reconstructed strains. Strains reconstructed by SAVAGE contain significantly less errors.

The method is already in heavy use and the responsible team is already engaged in collaborative projects with the University of California at San Diego and the Helmholtz Center for Infection Research. In the future, the research team will explore possibilities of the new computational tool on other species (e.g.  human genomes) and work on improving efficiency of overlap graph construction. The algorithm work might also be extended to other sequencing technologies.

The paper “De novo assembly of viral quasispecies using overlap graphs” by J.A.Baaijens ,A.Z.E. Aabidine, E. Rivals, and A. Schönhuth was published in Genome Research (May 2017 27: 835-848)

More information:

Life Sciences and Health group

Link to paper