Predicting protein secretion success

Abstract: 
The cell-factory Aspergillus niger is widely used for industrial enzyme production. Selecting enzymes for large-scale production requires costly lab work to test for successful high-level secretion of the over-expressed enzyme. To reduce the amount of lab work, we developed a sequence-based classifier that predicts successful high-level secretion of homologous proteins. This enables the selection of a subset of potential enzymes out of a large set of enzymes. A dataset of 638 proteins was used to train and validate a classifier, using a 10-fold cross-validation protocol. Using a linear discriminant classifier, an average accuracy of 0.85 was achieved, which in practice could lead to half the amount of lab work. Feature selection results indicate what features are mostly defining for successful protein production, which could be an interesting lead to couple sequence characteristics to biological processes involved in protein production and secretion.
Picture: