A multi-level nearest-neighbour algorithm for predicting protein secondary structure

Open AccessDissertation

A multi-level nearest-neighbour algorithm for predicting protein secondary structure

Iustin Lazar-1998-01-01-Spectrum Research Repository (Concordia University)

TL;DRAbstract

A thesis on machine learning and prediction of protein secondary structure. We develop a variation of the nearest-neighbour algorithm that adopts a multi-level strategy together with a variable window size. The algorithm is applied to the problem of predicting the secondary structure of a protein given its primary structure: that is, given a sequence of amino-acids, output a sequence of secondary structures (helix, sheet, or coil). A new training set is developed that is orthogonal, and covers the known classes of proteins. Overall accuracy is 65.0%, with 68.7% accuracy for helices, 66.3% accuracy for sheets, and 61.4% for coils. This compares well with existing methods, in that the best results for a single nearest-neighbour classifier is 65.1% by Salzberg and Cost in 1992. Our accuracy rate for sheets is better than known methods, but our accuracy rate for coils is much lower than existing methods.

Chat with Paper

AI Agents for this Paper

A thesis on machine learning and prediction of protein secondary structure. We develop a variation of the nearest-neighbour algorithm that adopts a multi-level strategy together with a variable window size. The algorithm is applied to the problem of predicting the secondary structure of a protein given its primary structure: that is, given a sequence of amino-acids, output a sequence of secondary structures (helix, sheet, or coil). A new training set is developed that is orthogonal, and covers the known classes of proteins. Overall accuracy is 65.0%, with 68.7% accuracy for helices, 66.3% accuracy for sheets, and 61.4% for coils. This compares well with existing methods, in that the best results for a single nearest-neighbour classifier is 65.1% by Salzberg and Cost in 1992. Our accuracy rate for sheets is better than known methods, but our accuracy rate for coils is much lower than existing methods.

Keywords

Protein secondary structureAlgorithmk-nearest neighbors algorithmSequence (biology)Classifier (UML)Nearest neighbourComputer scienceProtein structure prediction

Chat

Click to start Chat