LSH Seminar Nadia Pisanti (University Pisa)

Mapping Reads and Palindromic Decomposition on Pan-Genomes

Mapping Reads and Palindromic Decomposition on Pan-Genomes

An elastic-degenerate string (ED-string) has been introduced to compactly represent a multiple alignment of several closely-related sequences (a pan-genome). In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern in an elastic-degenerate text. We introduce an algorithm to solve this problem on-line after a pre-processing stage. Moreover, we study the same problem under the edit- and Hamming-distance model. Finally, we describe a linear time algorithm for the comparison of non elastic D-strings (a special case of ED-strings) and a consequent efficient decomposition of D-strings into palindromes.