Improving Stroke Prediction Accuracy through Machine Learning and Synthetic Minority Over-sampling

Muhammad Abdullah Aish; Amina Abdul Ghafoor; Fawad Nasim; Kiran Irfan Ali; Shamim Akhter; Sumbul Azeem

Authors

Muhammad Abdullah Aish Department of Software Engineering, The Superior University, Lahore, 54000, Pakistan.
Amina Abdul Ghafoor Department of Software Engineering, The Superior University, Lahore, 54000, Pakistan.
Fawad Nasim Faculty of Computer Science and Information Technology, The Superior University, Lahore, 54600, Pakistan.
Kiran Irfan Ali Aesthetics Lab Powered by Tibbi, Lahore, 54782, Pakistan.
Shamim Akhter School of Information Management, Minhaj University, Lahore, 54000, Pakistan.
Sumbul Azeem Lahore College for Women University, Jail Road, Lahore, 54000, Pakistan.

Keywords:

Bagging Classifier, Early Detection, Healthcare Analytics, Imbalanced Data, Machine Learning, SMOTE, Stroke Prediction

Abstract

Strokes are a leading cause of death and disability worldwide. Accurate prediction and early intervention can significantly improve patient outcomes. The objective of this study is to develop a model that will effectively predict stroke events based on the application of machine learning methods using a Harvard Dataverse Repository dataset containing 43,400 samples with 10 features. The dataset was imbalanced, with 42,617 non-stroke cases versus 783 stroke cases; hence, SMOTE was applied to balance the dataset. Models were evaluated using “accuracy, precision, recall, F1-score, and ROC AUC”. ML models included “logistic regression, decision tree, random forest, gradient boosting, adaboost, XGBoost, support vector machine, k-nearest neighbors, Naive Bayes, bagging classifier, and voting classifier”. The best model was that of the Bagging Classifier at an accuracy of 98.3%, precision of 98.7%, recall of 98.0%, an F1-score of 98.3%, and a ROC AUC of 99.5%. Then, it proved the robustness and reliability of this model. The current research demonstrates the power of SMOTE in solving class imbalance and underlines the possible role of advanced machine learning techniques in building feasible predictive tools for detecting stroke incidents in their incipient stage. Improvements such as these in the field may have a significant effect on bettering patient outcomes and reducing burdens on healthcare. Moreover, the implementation of such predictive models within clinical workflows could enable timely medical interventions, hence improving the quality of care for those people who are at risk of stroke. The work also opens up a variety of possibilities for deep learning and other sophisticated machine-learning techniques in healthcare, underlining the fact that further innovating and developing this area is necessary.

Improving Stroke Prediction Accuracy through Machine Learning and Synthetic Minority Over-sampling

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

SCOPUS

SCOPUS Q3

HJRS

ISSN

Online First

Call for Papers

Make a Submission

Open Access

Information

Conference

SC-2