Classification of Cardiovascular Disease Based on Lifestyle Using Random Forest and Logistic Regression Methods

Ajyan Brava Bietrosula, Indah Werdiningsih, Eto Wuriyanto

Abstract


Cardiovascular disease is a non-communicable disease caused by a disturbance in the function of the heart or blood vessels. According to WHO country profile data released in 2018 regarding non-communicable diseases, cardiovascular disease is the highest cause of death in Indonesia. This study aims to classify cardiovascular disease based on lifestyle using the Random Forest and Logistic Regression methods. In the classification process with the Random Forest and Logistic Regression machine learning methods, a combination of parameters from each machine learning method will be tested to see which parameter combination is the best for processing and classifying cardiovascular disease datasets. The dataset used in this research is obtained from Kaggle called Cardiovascular Disease. The dataset was processed through several pre-processing stages, namely missing value imputation, outlier detection, and extreme data checking. After going through the preprocessing process, the amount of data that entered the classification process was 62478 rows of data with 13 attributes or columns, namely age, height, weight, gender, systolic blood pressure, diastolic blood pressure, cholesterol, glucose, smoking, alcohol intake, physical activity, and cardiovascular disease. Dividing the dataset into different percentage distributions of training data and testing data was also tested to see the difference in classification performance of the two methods. The division of training data was 90% and testing data is 10%. The results obtained from this study were the Logistic Regression method had better accuracy results of 73.07% compared to Random Forest with an accuracy result of 71.87%.

References


World Health Organization. (2018). Noncommunicable diseases country profiles 2018. Geneva: World Health Organization, 2018. [cited 2 November 2022]. Available from: https://apps.who.int/iris/handle/10665/274512.

Prakasa, R. A., Valentina, D. C. D., Abdiana, R., Handayani, R., & BP, N. I. (2020). Analisis Faktor Risiko Pasien Gagal Jantung dengan Reduced Ejection Fraction di RSUD Dr. H. Abdul Moeloek Provinsi Lampung. Essential: Essence of Scientific Medical Journal, 18(1), 22. https://doi.org/10.24843/estl.2020.v18.i01.p07

Karyatin, K. (2019). Faktor-Faktor Yang Berhubungan Dengan Kejadian Penyakit Jantung Koroner. Jurnal Ilmiah Kesehatan, 11(1), 37-43. https://doi.org/10.37012/jik.v11i1.66

Mattioli, A. V., & Puviani, M. B. (2020). Lifestyle at Time of COVID-19: How Could Quarantine Affect Cardiovascular Risk. American Journal of Lifestyle Medicine, 14(3), 240–242. https://doi.org/10.1177/1559827620918808

Shan, Z., Li, Y., Baden, M. Y., Bhupathiraju, S. N., Tang, B. Z., Hu, F. B., Rexrode, K. M., Rimm, E. B., Qi, L., Willett, W. C., Manson, J. E., Qi, Q., & Hu, F. B. (2020). Association Between Healthy Eating Patterns and Risk of Cardiovascular Disease. JAMA Internal Medicine, 180(8), 1090. https://doi.org/10.1001/jamainternmed.2020.2176

Wardah Hanifah., Wanda Septi Oktavia., dan Hoirun Nisa.,(2021). Lifestyle Factors And Coronary Heart Disease: A Systematic Review Among Indonesian Adults., The Journal of Nutrition and Food Research., 44(1), 45-58.

Charbuty, B., & Abdulazeez, A. M. (2021). Classification Based on Decision Tree Algorithm for Machine Learning. Journal of Applied Science and Technology Trends, 2(01), 20–28. https://doi.org/10.38094/jastt20165

Petkovic, D., Altman, R., Wong, M., & Vigil, A. (2018). Improving the explainability of Random Forest classifier–user centered approach. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018: Proceedings of the Pacific Symposium (pp. 204-215). https://doi.org/10.1142/9789813235533_0019

Ciu, T., & Oetama, R. S. (2020). Logistic Regression Prediction Model for Cardiovascular Disease. International Journal of New Media Technology, 7(1), 33–38. https://doi.org/10.31937/ijnmt.v7i1.1340

Sharma, V., Yadav, S., & Gupta, M. (2020). Heart Disease Prediction using Machine Learning Techniques. In 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN). https://doi.org/10.1109/icacccn51052.2020.9362842

Gupta, C. K., Saha, A., Reddy, N., & Acharya, U. D. (2022). Cardiac Disease Prediction using Supervised Machine Learning Techniques. Journal of Physics: Conference Series, 2161(1), 012013. https://doi.org/10.1088/1742-6596/2161/1/012013

Cardiovascular Disease dataset. (2019, January 20). Kaggle. https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset

Zhou, Q., Lan, W., Zhou, Y., & Mo, G. (2020). Effectiveness Evaluation of Anti-bird Devices based on Random Forest Algorithm. 2020 7th International Conference on Information, Cybernetics, and Computational Social Systems (ICCSS). https://doi.org/10.1109/iccss52145.2020.9336891

Lin, W. C., & Tsai, C. F. (2019). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53(2), 1487–1509. https://doi.org/10.1007/s10462-019-09709-4

Boukerche, A., Zheng, L., & Alfandi, O. (2020). Outlier Detection. ACM Computing Surveys, 53(3), 1–37. https://doi.org/10.1145/3381028

Daeli, N. O. F., & Adiwijaya, A. (2020, May 11). Sentiment Analysis on Movie Reviews using Information Gain and K-Nearest Neighbor. Journal of Data Science and Its Applications, 3(1), 1-7. https://doi.org/10.34818/jdsa.2020.3.22

Wei, Y., Yang, Y., Xu, M., & Huang, W. (2021). Intelligent fault diagnosis of planetary gearbox based on refined composite hierarchical fuzzy entropy and random forest. ISA Transactions, 109, 340–351. https://doi.org/10.1016/j.isatra.2020.10.028

Princy, R. J. P., Parthasarathy, S., Jose, P. S. H., Lakshminarayanan, A. R., & Jeganathan, S. (2020). Prediction of Cardiac Disease using Supervised Machine Learning Algorithms. https://doi.org/10.1109/iciccs48265.2020.9121169

Martins, B. A., Ferreira, D., Neto, C., Abelha, A., & Machado, J. (2021). Data mining for cardiovascular disease prediction. Journal of Medical Systems, 45(1). https://doi.org/10.1007/s10916-020-01682-8

Centers for Disease Control and Prevention. (2021). Cardiovascular Diseases [Fact Sheet]. https://www.cdc.gov/globalhealth/healthprotection/ncd/cardiovascular-diseases.html


Full Text: PDF

Refbacks

  • There are currently no refbacks.


 

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
ISSN 2089-3272

Creative Commons Licence

This work is licensed under a Creative Commons Attribution 4.0 International License.

web analytics
View IJEEI Stats

Error. Page cannot be displayed. Please contact your service provider for more details. (2)