Improving human disease gene ranking by cross-species function prediction
Samira Jaeger (PhD student from Humboldt University Berlin, Knowledge Management in Bioinformatics group) will speak about improving human disease gene ranking by cross-species function prediction.
Location: CWI, M279
The talk will be around half an hour, followed by a discussion.
For many human diseases with a genetic background it is not yet known which genes are involved in the formation of the disease. Recently, methods for ranking genes wrt. to their role in diseases have been developed. Typically, these methods first combine several evidences, such as data from disease databases, protein interactions, or protein functions, into a graph structure and then apply a scoring method, such as network centrality, to rank genes. Highly-ranked genes are then considered potential candidates for further study.
However, it remains an open question which types of data must be combined to achieve the
best possible result. In particular, the usage of cross-species information, which can be a powerful tool to overcome the limits of human gene annotation, has not yet been explored. We show that the usage of predicted protein functions based on a cross-species data set can
considerably improve human disease gene ranking. We use a function prediction method that combines two independent prediction methods that both utilize protein-protein interaction data to overcome limitations and maintain benefits of each method.
First, we compare cross-species protein interaction networks to identify conserved and connected subgraphs (CCSs). Within each CCS we infer protein functions from orthology relationships and along interactions with neighboring proteins. This method achieves very good performance in precision and coverage. For example, when considering rat, human and yeast we reach precisions up to e.g. 85%, 89% and 86%. Predicted functions are then integrated into a framework for disease gene ranking.
We build disease networks from seed proteins extended by interacting and functionally similar proteins, and use the PageRank centrality to identify hubs in the network. We show that adding predicted annotations improves the ranking from 72% to 78%. In particular, our method finds highly-ranked proteins that are weakly or not annotated at all in first place and that are therefore not captured by other methods.

