A puzzle with million pieces
Viruses such as HIV, Zika and Ebola change their genomes (a complete set of DNA, including all of its genes) very quickly during an infection. As a result, an organism infected by a virus will host a variety of mutated versions (“strains” or “haplotypes”) of this virus, together called “viral quasispecies”. This allows the virus to adapt to its environment, making it hard to cure the viral infection. Determining which strains are present is the start of determining a therapy protocol.
In her thesis, Baaijens presents several approaches for haplotype reconstruction that operate in a “de novo” fashion. This means that the newly developed methods do not require any prior information on the genome content. The fact that a representative genome (“reference”) of the virus is not a prerequisite for the reconstruction makes such tools especially innovative. Biases created by using a reference genome are a major hindrance for many viral quasispecies assembly approaches. The new tools together form the first de novo approach to full-length viral quasispecies reconstruction and achieve results with an accuracy beyond any existing method. Accurate reconstruction of each of the individual viral haplotypes causing the infection could lead to improved treatment plans and the development of novel medicine.
Next generation sequencing and overlap graphs
Baaijens and her colleagues were able to revive so called overlap graph-based techniques, which had been deemed impossible in modern, “next-generation sequencing” based settings because of the huge amounts of data involved in the analysis. By following the overlap graph paradigm they developed methods for assembling viral quasispecies as well as other polyploid genomes. The idea to use overlap graphs was crucial, because only this allows, finally, to distinguish technical errors from haplotype-specific sequence mutations. The methods presented outperform all relevant state-of-the-art approaches, often quite drastically, with respect to the quality of the reconstructed genome