Repository logo

Accurate prediction of protein function using GOstruct




Sokolov, Artem, author
Ben-Hur, Asa, advisor
Anderson, Chuck, committee member
McConnell, Ross M., committee member
Wang, Haonan, committee member

Journal Title

Journal ISSN

Volume Title


With the growing number of sequenced genomes, automatic prediction of protein function is one of the central problems in computational biology. Traditional methods employ transfer of functional annotation on the basis of sequence or structural similarity and are unable to effectively deal with today's noisy high-throughput biological data. Most of the approaches based on machine learning, on the other hand, break the problem up into a collection of binary classification problems, effectively asking the question ''does this protein perform this particular function?''; such methods often produce a set of predictions that are inconsistent with each other. In this work, we present GOstruct, a structured-output framework that answers the question ''what function does this protein perform?'' in the context of hierarchical multilabel classification. We show that GOstruct is able to effectively deal with a large number of disparate data sources from multiple species. Our empirical results demonstrate that the framework achieves state-of-the-art accuracy in two of the recent challenges in automatic function prediction: Mousefunc and CAFA.


Rights Access


machine learning
protein function prediction


Associated Publications