LSH Seminar Divyae Prasad

Epistasis unravelled by deep ConvNets: towards near-perfect phenotype classification from microbial genotypes

Epistasis unravelled by deep ConvNets: towards near-perfect phenotype classification from microbial genotypes

Sequence similarity based methods are the workhorse for identification of well described genetic fragments associated with a phenotype of interest. But how are new genetic markers (variations or risk loci) discovered? Genome-wide association studies (GWAS) are the current gold standard for such genotype to phenotype mapping efforts, and have been applied to understand the variations in pathogen genomes, especially those that are antimicrobial resistant. Yet GWAS often struggles to achieve both the statistical power needed for confident association calling, as well as the precision required to reject spurious findings. Here we develop and introduce a deep ConvNet, for tackling the genotype to phenotype mapping problem. Using P. aeruginosa (a bacterial species) pangenome as an example, we train our ConvNet to predict a binary drug response: a susceptible or resistance phenotype. We show near-perfect classification performance of our ConvNet models, demonstrating that non-linear additive effects (formally known as epistasis) of variants may entirely explain rapid evolution in pathogens. I take this opportunity to discuss the remaining challenges in our project: to demystify the workings of the ConvNet black-box and connect/compare it to a statistical test. Finally, we would like to integrate evolutionary relatedness measures (in the form of a “kinship matrix”, perhaps as an additional net) into the ConvNet thereby augmenting the statistical power of our method - inspirations on the same would be appreciated!