UMLS-Augmented PubMedBERT for Clinical Note Diagnosis Classification: A Comparative Study

Authors

  • Hamayun Muhammad Saqib University of Central Punjab, Lahore, 54000, Pakistan.
  • Ghulam Mustafa University of Central Punjab, Lahore, 54000, Pakistan.

Keywords:

Clinical NLP, PubMedBERT, UMLS, Transformer Models, Clinical Decision Support, Biomedical Text Classification

Abstract

Transformer-based language models have already demonstrated a good performance in the domain of biomedical text classification, but, the majority of the existing methodology uses unstructured textual data only, thereby, evident deficiency of explicit incorporation of curated biomedical knowledge. The present research evaluates the ways in which varying schemes of integration of Unified Medical Language System (UMLS) concepts can be applied to enhance PubMedBERT functioning in terms of the task to classify clinical note diagnosis. Three model configurations are compared with the publicly available PMC-Patients corpus, (i) a baseline PubMedBERT model that is trained just using the raw clinical notes, (ii) a PubMedBERT model that is trained using the raw clinical notes, but additionally augmented with unfiltered UMLS Concept Unique Identifiers (CUIs), and (iii) a PubMedBERT model that is trained using the raw clinical notes, but further augmented with NER-filtered UMLS concepts. They are trained using the weighted cross-entropy loss to address the issue of class imbalance and evaluated using the assistance of accuracy, macro and weighted F1-scores, per-class models, multiple-seed experiments, and bootstrap confidence intervals as well as paired samples t-tests. The findings imply that naive injection of the complete system of UMLS concepts has a negative impact on the performance that implies that injection of ontologies by unedits injects undesired noise. Active selection of concepts filtered by NER to the UMLS, on the other hand, leads to a gradual, but steady increase in performance in concept classification, with an accuracy of 98.8 percent and a massive improvement in minority classes, such as asthma and heart disease. The latter demonstrate that biomedical knowledge may be handy in enriching performance of transformer-based clinical text classification as it is presented in an intentional and systematic way, and because the enhancement of the performance is accompanied by the increase in computational costs.

Downloads

Published

2025-12-01

How to Cite

Hamayun Muhammad Saqib, & Ghulam Mustafa. (2025). UMLS-Augmented PubMedBERT for Clinical Note Diagnosis Classification: A Comparative Study. Journal of Computing & Biomedical Informatics, 10(01). Retrieved from https://jcbi.org/index.php/Main/article/view/1175

Issue

Section

Articles