Determining the C-Terminal Amino Acid of a Peptide from MS/MS Data
Proteomics is currently chiefly based on mass spectrometry (MS) which is the tool of choice to investigate proteins. Two computational approaches to derive the tandem mass spectrum precursor’s sequence are widely employed. Database search essentially retrieves the sequence by matching the spectrum to all entries in a database whereas de novo sequencing does not depend on a sequence database. Both approaches benefit from knowledge about the enzyme used to generate the peptides. Most algorithms default to trypsin for its abundant usage. Trypsin cuts after arginine and lysine and thus the c-terminal amino acid is not known precisely and usually either of the two. Furthermore, 90% of protein terminal peptides may not end with either arginine or lysine and may thus contain any of the other amino acids. Here an algorithm is presented which predicts the c-terminal amino acid to be arginine, lysine or any other.
Here an algorithm, named RKDecider, to sort the c-terminal amino acid into one of three groups (arginine, lysine, and other) is presented. Although around 90% accuracy was achieved during data mining spectra for rules that determine the c-terminal amino acid, the implementation’s (RKDecider) accuracy is a little less and achieves about 80%. This is due to the fact that the decision trees were implemented as a rule-based system for speed considerations. The implementation is freely available at: http://bioinformatics.iyte.edu.tr/RKDecider.