Assessing the Loss of Information through Application of the ‘Two-hit Rule’ in iTRAQ Datasets
High-throughput studies of complex protein mixtures using proteomic workflows typically employ tandem mass spectrometric analysis of peptides obtained by tryptic digestion. Protein identification is achieved by comparing the experimentally obtained peptide MS/MS spectra to theoretical spectra. Protein identifications based on peptide fragment sequences are often judged valid using the so called ‘two-peptide’ rule whereby any protein identified by sequencing of fragment ions must be justified by the identification of two sequence unique peptides from the same protein. This excludes proteins identified on the basis of a single peptide ‘hit’ (often termed a one-hit wonder, or OHW). Applying the ‘two hit’ stringency may result in the loss of potentially valuable meta-data: information yielded or consolidated by valid OHW proteins may be overlooked. This study tests the hypothesis that certain groups of OHW proteins (and thus related biological events or pathways) are more likely to be identified by single peptide due to various physical or biochemical characteristics (molecular weight and isoelectric point). We have undertaken analysis on data from three independent quantitative iTRAQ based proteomic studies of a human colon cell line and human colon tissue to correlate the differences between OHW and “valid” protein sets for molecular weight, isoelectric point and for associated biological pathways. The results show that there is a possible trend of inverse correlation between the pI value of a protein and the number of peptide hits for identification. Molecular weights range from 30-60 kDa. Pathway analysis using EBI-EMBL Reactome SkyPainter found that by excluding OHWs, several biological pathways were consistently not mapped, suggesting that exclusion of OHW potentially limits the understanding the biological processes potentially identified within the whole dataset. Future work should address strategies for evaluation of validity and reproducibility of these conclusions in other tissues.