Diabetic Retinopathy (DR) is one of the most prevalent microvascular complications of diabetes and remains a leading cause of preventable blindness worldwide. Early diagnosis and timely intervention are critical to reducing vision loss; however, traditional screening methods are often resource-intensive and dependent on expert ophthalmologists. In recent years, deep learning-based automated systems have emerged as powerful tools for DR detection and grading, primarily utilizing retinal fundus images. Despite their promising performance, the effectiveness and generalizability of these models are significantly influenced by the datasets used for training and evaluation.This review paper presents a comprehensive and systematic analysis of publicly available datasets for diabetic retinopathy detection. The study emphasizes dataset characteristics such as size, image quality, annotation type, and class distribution, which play a crucial role in model performance. Furthermore, datasets are categorized based on their scale (small, medium, and large), annotation levels (image-level, lesion-level, and pixel-level), and clinical applicability. The paper also highlights key challenges, including class imbalance, variability in imaging conditions, and limited availability of high-quality annotated data. By providing a detailed comparative evaluation, this review aims to guide researchers in selecting appropriate datasets and identifying gaps for future dataset development in DR research.
Introduction
Diabetic Retinopathy (DR) is a serious diabetes-related eye disease that can lead to blindness if not detected early. Deep learning, especially CNN-based models, has greatly improved automated DR detection using retinal fundus images. However, the performance of these models heavily depends on the quality, size, diversity, and annotation accuracy of datasets. Poor or imbalanced datasets can lead to biased and unreliable predictions, making dataset selection and design crucial for clinical applications.
DR datasets are classified based on size (small, medium, large), annotation type (image-level, lesion-level, pixel-level), and clinical relevance (screening, diagnostic, and multimodal datasets). Large datasets like EyePACS support robust training, while smaller datasets like IDRiD provide detailed lesion annotations useful for segmentation. Common datasets include EyePACS, APTOS 2019, Messidor-2, IDRiD, and DDR, each with different strengths and limitations in terms of size, quality, and accessibility.
Key challenges in DR datasets include class imbalance (few severe cases), variability in imaging conditions, limited expert annotations, and domain shift between datasets. These issues reduce model generalization and clinical reliability. Overall, while existing datasets have enabled significant progress in automated DR detection, improving dataset quality, balance, and standardization is essential for developing more accurate and clinically deployable deep learning systems.
Conclusion
Datasets play a fundamental and indispensable role in the development and performance of deep learning models for diabetic retinopathy (DR) detection. The success of automated diagnostic systems is highly dependent on the quality, diversity, and scale of the data used during training and evaluation. Large-scale datasets such as EyePACS enable the development of robust and generalizable models by providing extensive variability in imaging conditions and disease severity. In contrast, smaller datasets like IDRiD offer detailed pixel-level annotations that are particularly valuable for tasks such as lesion segmentation and interpretability.However, no single dataset is sufficient to address all aspects of DR detection, including classification, grading, and lesion localization. Each dataset presents its own strengths and limitations, such as class imbalance, limited diversity, or restricted accessibility. As a result, combining multiple datasets has emerged as a promising approach to enhance data diversity and improve model performance. Nevertheless, this approach introduces challenges such as domain shift and inconsistencies in labeling standards.Future research should focus on developing large-scale, standardized, and well-annotated datasets that incorporate diverse populations and imaging conditions. Such advancements will be critical for improving the reliability, fairness, and clinical applicability of deep learning-based DR detection systems in real-world healthcare settings.
References
[1] Bhulakshmi, D., & Rajput, D. S. (2024). A systematic review on diabetic retinopathy detection and classification based on deep learning techniques. PeerJ Computer Science. (PeerJ)
[2] Sebastian, A. (2023). A survey on deep-learning-based diabetic retinopathy diagnosis from fundus images. Diagnostics. (MDPI)
[3] Bappi, M. D. I. et al. (2025). Deep learning-based diabetic retinopathy recognition and grading: Challenges and gaps. ICT Express. (ScienceDirect)
[4] Gong, W. et al. (2025). Deep learning for enhanced prediction of diabetic retinopathy. Frontiers in Medicine. (PMC)
[5] Dejene, F. M. (2025). Machine learning and deep learning in diabetic retinopathy screening: A review. (PMC)
[6] Dai, L. et al. (2024). DeepDR Plus: Predicting DR progression using large-scale datasets. Nature Medicine. (Nature)
[7] Brant, A. et al. (2025). Performance evaluation of deep learning DR screening across multi-site datasets. (PMC)
[8] Akhtar, S. et al. (2025). Deep learning-based DR grading using Kaggle datasets. Scientific Reports. (Nature)
[9] Mutawa, A. M. et al. (2024). Deep learning model for DR detection using EyePACS and Messidor datasets. Applied Sciences. (MDPI)
[10] Karthik, S. A. et al. (2025). Early detection and severity classification of DR using combined datasets. (Springer Link)
[11] Chakour, E. M. et al. (2025). Mobile-based DR detection using APTOS and EyePACS datasets. (ScienceDirect)
[12] Sen, C. et al. (2025). Deep learning model using data augmentation for DR datasets. (ScienceDirect)
[13] Nadda, R. et al. (2025). Ensemble learning for DR detection using retinal datasets. (Springer Link)
[14] Mofreh, A. (2025). CNN-based DR detection using fundus datasets. (erurj.journals.ekb.eg)
[15] Sabo, A. G. (2024). Scalable deep learning system for DR detection. (Cureus Journals)
[16] Shukla, A. et al. (2024). HybridFusionNet: Vision transformer-based DR detection. (MDPI)
[17] ResearchGate Study (2025). Multimodal data fusion approaches in DR detection. (ResearchGate)
[18] VR-FuseNet (2025). Fusion of heterogeneous DR datasets for classification. (arXiv)
[19] DRAC Challenge (2023). Ultra-wide OCTA dataset for DR analysis. (arXiv)
[20] DRStageNet Study (2023). Multi-dataset domain adaptation for DR detection. (arXiv)
[21] Al-Kamachy, I. et al. (2024). Pre-trained models for DR classification using Kaggle datasets. (arXiv)
[22] Bappi et al. (2025). Benchmarking DR datasets with attention-based models. (ScienceDirect)
[23] Waboke, W. R. (2025). Analysis of commonly used DR datasets in deep learning. (slujst.com.ng)
[24] Mushtaq, G. et al. (2021/2023 cited works). Deep learning methodologies for DR detection. (Sage Journals)
[25] IJSDR Study (2025). DR detection using Kaggle dataset and CNN models. (ijsdr.org)