Improving Stroke Prediction Accuracy through Machine Learning and Synthetic Minority Over-sampling
Keywords:
Bagging Classifier, Early Detection, Healthcare Analytics, Imbalanced Data, Machine Learning, SMOTE, Stroke PredictionAbstract
Strokes are a leading cause of death and disability worldwide. Accurate prediction and early intervention can significantly improve patient outcomes. The objective of this study is to develop a model that will effectively predict stroke events based on the application of machine learning methods using a Harvard Dataverse Repository dataset containing 43,400 samples with 10 features. The dataset was imbalanced, with 42,617 non-stroke cases versus 783 stroke cases; hence, SMOTE was applied to balance the dataset. Models were evaluated using “accuracy, precision, recall, F1-score, and ROC AUC”. ML models included “logistic regression, decision tree, random forest, gradient boosting, adaboost, XGBoost, support vector machine, k-nearest neighbors, Naive Bayes, bagging classifier, and voting classifier”. The best model was that of the Bagging Classifier at an accuracy of 98.3%, precision of 98.7%, recall of 98.0%, an F1-score of 98.3%, and a ROC AUC of 99.5%. Then, it proved the robustness and reliability of this model. The current research demonstrates the power of SMOTE in solving class imbalance and underlines the possible role of advanced machine learning techniques in building feasible predictive tools for detecting stroke incidents in their incipient stage. Improvements such as these in the field may have a significant effect on bettering patient outcomes and reducing burdens on healthcare. Moreover, the implementation of such predictive models within clinical workflows could enable timely medical interventions, hence improving the quality of care for those people who are at risk of stroke. The work also opens up a variety of possibilities for deep learning and other sophisticated machine-learning techniques in healthcare, underlining the fact that further innovating and developing this area is necessary.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License