Diagnostic Prediction based on Medical Notes using Machine Learning

Authors

  • Nazir Ahmad Department of Information Technology, IUB, Bahawalpur, 63100, Pakistan.
  • H. M. Shafiq Ur Rehman Department of Information Technology, IUB, Bahawalpur, 63100, Pakistan.
  • Mubasher H. Malik Department of Computer Science, ISP, Multan, 59300, Pakistan.
  • Syed Ali Nawaz Department of Information Technology, IUB, Bahawalpur, 63100, Pakistan.
  • M. Abdul Qadoos Bilal College of Information and Computer, TUT, Taiyuan, 03000, China.

Keywords:

Medical notes, Machine learning, Bag of words, Random Forest, Logistic Regression

Abstract

Clinical experts have extracted clinically relevant information from clinical notes through manual review, which has had scaling and financial issues. This is particularly relevant for different diseases since clinical notes prevail over structured data. The availability of this data gives a wonderful opportunity for natural language processing (NLP) to automatically extract clinically relevant information that might delay or prevent the onset of disease, but it also poses several challenges. In this work, we sought to investigate the current state of the art and suggest possible future research pathways that might expedite the general use of natural language processing in disease-related clinical notes. In this study, Kaggle, an open-source platform for machine learning challenges, provides the dataset. The patient's age, gender, diagnoses, and other vitals are all included in the dataset's text format. The dataset collection contains information from many categories. Each stage plays an important role in predicting patient therapy based on clinical notes, from dataset preparation through model training and testing. Two feature engineering methods, term frequency-inverse document frequency and bag of words are used for feature extraction. Six distinct machine learning (ML) methods, Naive Bayes, Light GBM, Random Forest (RF), Logistic Regression, Support Vector Machines (SVM), and Extra Tree Classifiers were employed for analysis. Various sample sizes of the dataset have been used in the proposed study. Based on the findings, logistic regression is the most effective algorithm for predicting medical therapy, with an accuracy of 85.94%.

Downloads

Published

2024-02-01

How to Cite

Nazir Ahmad, H. M. Shafiq Ur Rehman, Mubasher H. Malik, Syed Ali Nawaz, & M. Abdul Qadoos Bilal. (2024). Diagnostic Prediction based on Medical Notes using Machine Learning. Journal of Computing & Biomedical Informatics. Retrieved from https://jcbi.org/index.php/Main/article/view/337