Diagnostic Prediction based on Medical Notes using Machine Learning
Keywords:
Medical notes, Machine learning, Bag of words, Random Forest, Logistic RegressionAbstract
Clinical experts have extracted clinically relevant information from clinical notes through manual review, which has had scaling and financial issues. This is particularly relevant for different diseases since clinical notes prevail over structured data. The availability of this data gives a wonderful opportunity for natural language processing (NLP) to automatically extract clinically relevant information that might delay or prevent the onset of disease, but it also poses several challenges. In this work, we sought to investigate the current state of the art and suggest possible future research pathways that might expedite the general use of natural language processing in disease-related clinical notes. In this study, Kaggle, an open-source platform for machine learning challenges, provides the dataset. The patient's age, gender, diagnoses, and other vitals are all included in the dataset's text format. The dataset collection contains information from many categories. Each stage plays an important role in predicting patient therapy based on clinical notes, from dataset preparation through model training and testing. Two feature engineering methods, term frequency-inverse document frequency and bag of words are used for feature extraction. Six distinct machine learning (ML) methods, Naive Bayes, Light GBM, Random Forest (RF), Logistic Regression, Support Vector Machines (SVM), and Extra Tree Classifiers were employed for analysis. Various sample sizes of the dataset have been used in the proposed study. Based on the findings, logistic regression is the most effective algorithm for predicting medical therapy, with an accuracy of 85.94%.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License