perspective – Xiao Liang, Ph.D. 梁潇

Software tools for proteomic data analysis are being fast developed. However, several issues remain and expect future improvements. Numerous reviews have elaborated on this topic. Here I enumerate two issues that are not popularly discussed.

User-unfriendness
Built by lab scientists, several developmental software is command line-based ^1-3. They are user-unfriendliness for biologists, especially with multiple computation steps and tedious parameter settings, thus rendering the interest of experimentalists or clinicians ⁴. Also, the newest knowledge usually targets for specific steps instead of the complete process, which cannot be applied for immediate use by un-experienced enthusiasts. For example, CANDIA is an advanced data de-convolution algorithm that takes advantage of GPU computation to dissect peptide signals into individual analyte spectra ⁵. Although 33 times more total ion current could be discovered for the down-stream database search, massive application of CANDIA is un-realistic without efforts on software encapsulation. Building up graphic interfaced software that are integrated and could enable plug-and-play analyses from new tools.

Un-annotated proteins
More than 90% of the human proteome have been covered in database ⁶. However, in practice, a high fraction of MS signals is not annotated as peptides. Apart from the technical variations, they could be the un-annotated proteins, sometimes termed as the “dark proteome”. The comparison of MS data (especially DIA-MS data) from multiple sources might prioritize plausible footprints. Mapping them to the un-reviewed TrEMBL database be a novel way to identify and validate proteins. Also, I have a good look on the deep learning (DL) technology. The current MS data repositories could be satisfactory training material for DL in the consensus of un-annotated proteins and furthermore their proteoforms ⁷.

Röst, H.L., et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat Biotechnol 32, 219-223 (2014).
Tsou, C.C., et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat Methods 12, 258-264, 257 p following 264 (2015).
Tran, N.H., et al. Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry. Nat Methods 16, 63-66 (2019).
Zhang, F., Ge, W., Ruan, G., Cai, X. & Guo, T. Data-Independent Acquisition Mass Spectrometry-Based Proteomics and Software Tools: A Glimpse in 2020. Proteomics, e1900276 (2020).
Buric, F., Zrimec, J. & Zelezniak, A. Parallel Factor Analysis Enables Quantification and Identification of Highly Convolved Data-Independent-Acquired Protein Spectra. Patterns 1, 100137 (2020).
Adhikari, S., et al. A high-stringency blueprint of the human proteome. Nature communications 11, 5301-5301 (2020).
Wen, B., et al. Deep Learning in Proteomics. Proteomics 20, e1900335-e1900335 (2020).

Tag: perspective

Two issues in software development for proteomic analyses