Cross-Dataset Unified Vision Transformer Model for Diabetic Retinopathy Detection

Authors

  • Kinjal Patni Department of Computer Engineering, Indus Institute of Technology and Engineering, Indus University, Ahmedabad, Gujarat, India.
  • Shruti Yagnik Department of Computer Engineering, Indus Institute of Technology and Engineering, Indus University, Ahmedabad, Gujarat, India.

Keywords:

Diabetic Retinopathy, Vision Transformers, EyePACS, APTOS

Abstract

Diabetic retinopathy (DR) is a major cause of blindness around the world that can be prevented; it is caused by prolonged hyperglycemia that leads to damage to retinal vasculature. Early detection of DR, and good DR grading are important to ensure timely clinical intervention and improved outcomes for patients. Machine learning and convolutional neural network (CNN)-based approaches have held promise for DR screening, but they are often limited in their ability to capture fine-grained lesions or long-range dependencies because of limited receptive fields and insufficient modeling of global context.Vision Transformers, or ViTs, lever self-attention mechanisms to capture global relationships across retinal structures. This is a Review of ViT based frameworks on grading DR. The review concentrates on studies using two of the most widely applied and diverse benchmarks between EyePACS and APTOS; expert annotated fundus images assessment. The review covers a variety of recent advances such as hybrid CNN-ViT architectures, lesion-aware transformer modules, multi-scale feature aggregation, and federated learning strategies for privacy-preserving medical image analysis. It then highlights the role of interpretable attention maps in improving clinical trust and decision transparency. This is also the review of the remaining challenges in DR grading; it includes extreme class imbalance in DR severity levels, high computational costs of transformer models, and the demand for a powerful and robust explainability technique to favor clinical adoption. In connecting current achievements, unsolved issues, and new directions of research, this review endeavors to orient researchers and practitioners towards designing efficient, generalizable, and clinically relevant ViT-based DR detection and grading systems. In general, it shows how much transformer-driven approaches have the potential to revolutionize automated ophthalmic diagnosis and enhance the global diabetic eye care workflow.

 

Downloads

Published

2025-12-01

How to Cite

Kinjal Patni, & Shruti Yagnik. (2025). Cross-Dataset Unified Vision Transformer Model for Diabetic Retinopathy Detection. Journal of Computing & Biomedical Informatics. Retrieved from https://jcbi.org/index.php/Main/article/view/1148

Issue

Section

Articles