UMLS-Augmented PubMedBERT for Clinical Note Diagnosis Classification: A Comparative Study
Keywords:
Clinical NLP, PubMedBERT, UMLS, Transformer Models, Clinical Decision Support, Biomedical Text ClassificationAbstract
Transformer-based language models have already demonstrated a good performance in the domain of biomedical text classification, but, the majority of the existing methodology uses unstructured textual data only, thereby, evident deficiency of explicit incorporation of curated biomedical knowledge. The present research evaluates the ways in which varying schemes of integration of Unified Medical Language System (UMLS) concepts can be applied to enhance PubMedBERT functioning in terms of the task to classify clinical note diagnosis. Three model configurations are compared with the publicly available PMC-Patients corpus, (i) a baseline PubMedBERT model that is trained just using the raw clinical notes, (ii) a PubMedBERT model that is trained using the raw clinical notes, but additionally augmented with unfiltered UMLS Concept Unique Identifiers (CUIs), and (iii) a PubMedBERT model that is trained using the raw clinical notes, but further augmented with NER-filtered UMLS concepts. They are trained using the weighted cross-entropy loss to address the issue of class imbalance and evaluated using the assistance of accuracy, macro and weighted F1-scores, per-class models, multiple-seed experiments, and bootstrap confidence intervals as well as paired samples t-tests. The findings imply that naive injection of the complete system of UMLS concepts has a negative impact on the performance that implies that injection of ontologies by unedits injects undesired noise. Active selection of concepts filtered by NER to the UMLS, on the other hand, leads to a gradual, but steady increase in performance in concept classification, with an accuracy of 98.8 percent and a massive improvement in minority classes, such as asthma and heart disease. The latter demonstrate that biomedical knowledge may be handy in enriching performance of transformer-based clinical text classification as it is presented in an intentional and systematic way, and because the enhancement of the performance is accompanied by the increase in computational costs.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License



