HybridPPI: A Hybrid Machine Learning Framework for Protein-Protein Interaction Prediction

Desidi Narsimha Reddy, Pinagadi Venkateswararao, M. Sree Vani, Vodapelli Pranathi, Anitha Patil

Abstract


Protein-protein interactions (PPIs) are key to cellular functions and disease mechanisms and are crucial for drug discovery and systems biology. Though experimental approaches, including yeast two-hybrid systems, provide informative discoveries, they are time-consuming, costly, and frequently yield significant false-positive rates. Newer computational tools, including DeepPPI and PIPR, have demonstrated their potential, but their reliance on single-modal features or specific machine-learning models limits their generalization and robustness. These limitations highlight the need for an enhanced framework that assimilates different types of features while integrating a diverse array of machine learning models to exploit the strengths offered by each model class. In this paper, we present a hybrid machine learning framework, HybridPPI, to effectively incorporate the power of sequence-based, structure-based, and network-based features based on wellknown ensemble learning techniques to predict PPIs. Our proposed algorithm is a stacking ensemble of multiple models (Support Vector Machines (SVM), Random Forest (RF), Convolutional Neural Networks (CNN), and Long Short-Term Memory Networks (LSTM)), with Gradient Boosting as the metamodel. Results show that HybridPPI (94.5% accuracy, 95.2% precision, and Area Under Curve of 0.97) outperforms the most advanced methods, indicating its robustness for PPI prediction. This scalable and generalizable framework can accommodate various biological applications. HybridPPI overcomes significant shortcomings of current methodologies and contributes to biological discovery.

 


Keywords


Protein-Protein Interaction; Hybrid Machine Learning; Ensemble Learning, MultiView Feature Integration; PPI Prediction

References


Yang, X., Yang, S., Li, Q., Wuchty, S., & Zhang, Z. (2019). Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method. Computational and Structural Biotechnology Journal. http://doi:10.1016/j.csbj.2019.12.005

Zhang, L., Yu, G., Xia, D., & Wang, J. (2018). Protein-protein interaction prediction is based on ensemble deep neural networks. Neurocomputing. http://doi:10.1016/j.neucom.2018.02.097

Chen, K.-H., Wang, T.-F., & Hu, Y.-J. (2019). Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme. BMC Bioinformatics, 20(1). http://doi:10.1186/s12859-019- 2907-1

AlQuraishi, M. (2021). Machine learning in protein structure prediction. Current Opinion in Chemical Biology, 65, 1–8. http://doi:10.1016/j.cbpa.2021.04.005

Chen, C., Zhang, Q., Yu, B., Yu, Z., Skillman-Lawrence, P. J., Ma, Q., & Zhang, Y. (2020). Improving proteinprotein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Computers in Biology and Medicine, 103899. http://doi:10.1016/j.compbiomed.2020.103899

Chen, C., Zhang, Q., Ma, Q., & Yu, B. (2019). LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion. Chemometrics and Intelligent Laboratory Systems, 191, 54–64. http://doi:10.1016/j.chemolab.2019.06.003

Yang, F., Fan, K., Song, D., & Lin, H. (2020). Graph-based prediction of Protein-protein interactions with attributed signed graph embedding. BMC Bioinformatics, 21(1). http://doi:10.1186/s12859-020-03646-8

Nasiri, E., Berahmand, K., Rostami, M., & Dabiri, M. (2021). A novel link prediction algorithm for protein-protein interaction networks by attributed graph embedding. Computers in Biology and Medicine, 137, 104772. http://doi:10.1016/j.compbiomed.2021.104772

D†™Souza, Sofia; K.V., Prema and S., Balaji (2020). Machine learning in drug†“ target interaction prediction: current state and future directions. Drug Discovery Today, S1359644620301033– . http://doi:10.1016/j.drudis.2020.03.003

Noé, F., De Fabritiis, G., & Clementi, C. (2020). Machine learning for protein folding and dynamics. Current Opinion in Structural Biology, 60, 77–84. http://doi:10.1016/j.sbi.2019.12.005

Feifan Zheng, Xin Jiang, Yuhao Wen, Yan Yang and Minghui Li. (2024). Systematic investigation of machine learning on limited data: A study on predicting protein-protein binding strength. Elsevier. 23, pp.460-472. https://doi.org/10.1016/j.csbj.2023.12.018

Rita T. Sousa, Sara Silva and Catia Pesquita. (2024). Explaining protein-protein interactions with knowledge graphbased semantic similarity. Elsevier. 170, pp.1-14. https://doi.org/10.1016/j.compbiomed.2024.108076

Muhammad Tahir ul Qamar, Fatima Noor, Yi-Xiong Guo, Xi-Tong Zhu and Ling-Ling Chen. (2024). Deep-HPIpred: An R-Shiny applet for network-based classification and prediction of Host-Pathogen protein-protein interactions. Elsevier. 23, pp.316-329. https://doi.org/10.1016/j.csbj.2023.12.010

Chaarvi Bansal, P.R. Deepa, Vinti Agarwal and Rohitash Chandra. (2024). A clustering and graph deep learningbased framework for COVID-19 drug repurposing. Elsevier. 249, pp.1-15. https://doi.org/10.1016/j.eswa.2024.123560

Janani Durairaj, Dick de Ridder and Aalt D.J. van Dijk. (2023). Beyond sequence: Structure-based machine learning. Elsevier. 21, pp.630-643. https://doi.org/10.1016/j.csbj.2022.12.0394

Arijit Chakraborty, Sajal Mitra, Mainak Bhattacharjee, Debashis De and Anindya J. Pal (2023). Determining humancoronavirus protein-protein interaction using machine intelligence. Elsevier. 18, pp.1-20. https://doi.org/10.1016/j.medntd.2023.100228

Alexandra-Ioana Albu, Maria-Iuliana Bocicor and Gabriela Czibula. (2023). MM-StackEns: A new deep multimodal stacked generalization approach for protein–protein interaction prediction. Elsevier. 153, pp.1-21. https://doi.org/10.1016/j.compbiomed.2022.106526

Eric W. Bell, Jacob H. Schwartz, Peter L. Freddolino and Yang Zhang. (2022). PEPPI: Whole-proteome Proteinprotein Interaction Prediction through Structure and Sequence Similarity, Functional Association, and Machine Learning. Elsevier. 434(11), pp.1-9. https://doi.org/10.1016/j.jmb.2022.167530

Monika Khandelwal, Ranjeet Kumar Rout and Saiyed Umer. (2022). Protein-protein interaction prediction from primary sequences using supervised machine learning algorithm. IEEE, pp.268-272. http://DOI:10.1109/Confluence52989.2022.9734190

Alexandra-Ioana Albu. (2022). An Approach for Predicting Protein-Protein Interactions using Supervised Autoencoders. Elsevier. 207, pp.2023-2032. https://doi.org/10.1016/j.procs.2022.09.261

Hibah Shaath, Radhakrishnan Vishnubalaji, Ramesh Elango, Ahmed Kardousha, Zeyaul Islam, Rizwan Qureshi, Tanvir Alam, Prasanna R. Kolatkar and Nehad M. Alajez. (2022). Long non-coding RNA and RNA-binding protein interactions in cancer: Experimental and machine learning approaches. Elsevier. 86(3), pp.325-345. https://doi.org/10.1016/j.semcancer.2022.05.013

Xiaotian Hu, Cong Feng, Tianyi Ling and Ming Chen. (2022). Deep learning frameworks for protein–protein interaction prediction. Elsevier. 20, pp.3223-3233. https://doi.org/10.1016/j.csbj.2022.06.025

Sarkar, D., & Saha, S. (2019). Machine-learning techniques for the prediction of protein–protein interactions. Journal of Biosciences, 44(4). http://doi:10.1007/s12038-019-9909-z

Lei, H., Wen, Y., Elazab, A., Tan, E.-L., Zhao, Y., & Lei, B. (2018). Protein-protein Interactions Prediction via Multimodal Deep Polynomial Network and Regularized Extreme Learning Machine. IEEE Journal of Biomedical and Health Informatics, 1–1. http://doi:10.1109/jbhi.2018.2845866

Sumonja, N., Gemovic, B., Veljkovic, N., & Perovic, V. (2019). Automated feature engineering improves prediction of protein–protein interactions. Amino Acids. http://doi:10.1007/s00726-019-02756-9

Zhang, D., & Kabuka, M. (2019). Multimodal deep representation learning for protein interaction identification and protein family classification. BMC Bioinformatics, 20(S16). http://doi:10.1186/s12859-019-3084-y

Dey, L., Chakraborty, S., & Mukhopadhyay, A. (2020). Machine learning techniques for sequence-based prediction of viral–host interactions between SARS-CoV-2 and human proteins. Biomedical Journal. http://doi:10.1016/j.bj.2020.08.003

Qiao, Y., Xiong, Y., Gao, H., Zhu, X., & Chen, P. (2018). Protein-protein interface hot spots prediction based on a hybrid feature selection strategy. BMC Bioinformatics, 19(1). http://doi:10.1186/s12859-018-2009-5

Jia, J., Li, X., Qiu, W., Xiao, X., & Chou, K.-C. (2018). iPPI-PseAAC(CGR): Identify protein-protein interactions by incorporating chaos game representation into PseAAC. Journal of Theoretical Biology. http://doi:10.1016/j.jtbi.2018.10.021

Leite, D. M. C., Brochet, X., Resch, G., Que, Y.-A., Neves, A., & Peña-Reyes, C. (2018). Computational prediction of inter-species relationships through omics data analysis and machine learning. BMC Bioinformatics, 19(S14). http://doi:10.1186/s12859-018-2388-7

Ashtiani, M., Salehzadeh-Yazdi, A., Razaghi-Moghadam, Z., Hennig, H., Wolkenhauer, O., Mirzaie, M., & Jafari, M. (2018). A systematic survey of centrality measures for protein-protein interaction networks. BMC Systems Biology, 12(1). http://doi:10.1186/s12918-018-0598-2

Sachdev, K., & Kumar Gupta, M. (2019). A Comprehensive Review of Feature Based Methods for Drug Target Interaction Prediction. Journal of Biomedical Informatics, 103159. http://doi:10.1016/j.jbi.2019.103159

Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., Goldenberg, A., & Hoffman, M. M. (2018). Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. Information Fusion. http://doi:10.1016/j.inffus.2018.09.012

Peng, J., Li, J., & Shang, X. (2020). A learning-based method for drug-target interaction prediction based on feature representation learning and deep neural network. BMC Bioinformatics, 21(S13). http://doi:10.1186/s12859-020- 03677-1

Zhang, W., Qu, Q., Zhang, Y., & Wang, W. (2018). The linear neighborhood propagation method for predicting long non-coding RNA–protein interactions. Neurocomputing, 273, 526–534. http://doi:10.1016/j.neucom.2017.07.065

Iqbal, M. J., Javed, Z., Sadia, H., Qureshi, I. A., Irshad, A., Ahmed, R., and Sharifi-Rad, J. (2021). Clinical applications of artificial intelligence and machine learning in cancer diagnosis: looking into the future. Cancer Cell International, 21(1). http://doi:10.1186/s12935-021-01981-1

[37] Carracedo-Reboredo, P., Liñares-Blanco, J., Rodríguez-Fernández, N., Cedrón, F., Novoa, F. J., Carballal, A., and Fernandez-Lozano, C. (2021). A review on machine learning approaches and trends in drug discovery. Computational and Structural Biotechnology Journal, 19, 4538–4558. http://doi:10.1016/j.csbj.2021.08.011

[38] Albaradei, S., Thafar, M., Alsaedi, A., Van Neste, C., Gojobori, T., Essack, M., & Gao, X. (2021). Machine learning and deep learning methods that use omics data for metastasis prediction. Computational and Structural Biotechnology Journal, 19, 5008–5018. http://doi:10.1016/j.csbj.2021.09.001

Silva, J. C. F., Teixeira, R. M., Silva, F. F., Brommonschenkel, S. H., & Fontes, E. P. B. (2019). Machine learning approaches and their current application in plant molecular biology: a systematic review. Plant Science. http://doi:10.1016/j.plantsci.2019.03.020

Szklarczyk, D., Gable, A.L., Nastou, K.C., Lyon, D., Kirsch, R., Pyysalo, S., Doncheva, N.T., Legeay, M., Fang, T., Bork, P., Jensen, L.J. and von Mering, C., 2021. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Research, 49(D1), pp.D605-D612. Available at: https://string-db.org


Full Text: PDF

Refbacks

  • There are currently no refbacks.


 

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
ISSN 2089-3272

Creative Commons Licence

This work is licensed under a Creative Commons Attribution 4.0 International License.

web analytics
View IJEEI Stats