Nederlands

PhD defence Estéban Gabory (N&O)

Variable strings for pan-genomes: matching, comparison, indexing.

When
21 May 2025 from 3:45 p.m. to 21 May 2025 4:45 p.m. CEST (GMT+0200)
Where
Auditorium, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam
Add

This dissertation investigates the computational foundations of sequence analysis in the context of pan-genomics, a rapidly evolving field in computational biology. Classical string algorithms, originally developed for linear DNA sequences, encounter new challenges when extended to nonlinear, graph-like pan genome representations. To address this, we study variable strings—generalized models for representing sets of similar sequences compactly, including elastic-degenerate strings (ED strings), founder graphs, and weighted sequences. The thesis makes three core contributions. First, it explores exact and approximate pattern matching algorithms for variable strings, establishing tight upper and lower bounds. Second, it introduces novel methods for comparing pan-genomic data structures, including algorithms for intersection detection, matching statistics, and distance-based comparisons. Third, it proposes space-efficient indexing strategies for weighted sequences, enabling probabilistic pattern queries under uncertainty.