This study aims to find the best fit algorithm for well-known disease Retinopathy caused by diabetes through a Deep Learning approach on two datasets. Methods: Four Deep Learning methods are employed on the datasets of Diabetic Retinopathy. The methods applied on the grayscale image files. Here, the image files from the dataset are divided into three parts: training- 80%, validation test -10% and test data-10%. We use the following preprocessing techniques to improve model generalization and address issues caused by sparse and unbalanced datasets: resize (aspect ratio), crop, normalize, augment (rotate,zoom,flipping). These techniques replicate realistic variations in retinal images and lessen overfitting. Our goal is to have clean, limited noise and test data to evaluate the model accurately. The dataset has been trained using the following methods: VGG16, ResNet50, InceptionV3 and EfficientNet correspondingly. Findings: From the above methods on the dataset-1, accuracy found from ResNet50 stood highest and on the dataset-2, accuracy found from Inception-v3 stood highest among the others. Novelty: To test generalizability and robustness, this study compares four significant convolutional neural network architectures for diabetic retinopathy classification across two heterogeneous retinal imaging datasets. Our work explores cross-dataset performance, model efficiency and clinically important metrics, providing useful insights for real-world deployment, in contrast to previous studies that were restricted to single-dataset evaluations.
Introduction
Diabetic Retinopathy (DR) is a leading cause of vision impairment caused by damage to retinal blood vessels from prolonged diabetes. Early detection is crucial to prevent irreversible blindness. Traditional screening methods like fundus photography and manual grading are labor-intensive and variable.
Advances in artificial intelligence, especially deep learning using Convolutional Neural Networks (CNNs), have revolutionized automated DR diagnosis by learning directly from retinal images. Early machine learning methods relied on handcrafted features, but the breakthrough came in 2015 with deep learning models achieving expert-level accuracy.
Key CNN architectures like AlexNet, VGGNet, ResNet, Inception, and EfficientNet have progressively improved performance, enabling better image classification, feature extraction, and efficiency suitable for real-world and resource-constrained settings.
Recent studies have introduced hybrid models, attention mechanisms, and privacy-preserving techniques like federated learning to enhance accuracy and data security. However, challenges such as variability in image quality, disease stages, and dataset imbalances persist, affecting model generalization.
This study conducts a comparative analysis of four popular CNN models—VGG16, ResNet50, InceptionV3, and EfficientNet—on two publicly available DR datasets to evaluate classification accuracy, convergence, and generalization. The goal is to identify the most effective architecture for practical automated DR screening.
The methodology involves preprocessing images, dataset splitting, fine-tuning pretrained models for five-stage DR classification, and evaluating performance with metrics like accuracy and loss. Dataset 1 consists of 3,662 grayscale fundus images; Dataset 2 contains 35,126 color images from Kaggle.
Conclusion
As stated in the introduction section, the focus is to identify the most suitable model based on classification accuracy, convergence behavior and generalization ability, thereby contributing to evidence-based model selection in automated DR detection, we found InceptionV3 and VGG16 did particularly well on the smaller D1dataset because they could latch onto more subtle features quickly and EfficientNet underperformed on D1 in these epochs—this could be due to:
• Architecture complexity requires more examples to fully leverage its capacity.
• Hyperparameters (learning rate, augmentations) not yet optimized for the smaller dataset.
On the D2 dataset, all models stabilize around 0.73, suggesting that regardless of architecture, the dataset\'s inherent difficulty sets a performance ceiling at those early epochs.
However, the vast scope of future advancement of this work can be recognized by the use of ReduceLROnPlateau or CosineAnnealingLR for adaptive learning and helps models avoid getting stuck in local minima. As DR datasets often have skewed class distribution, use of class weighting or focal loss can be applied to boost minority class detection.
As of recent studies up to 2024, EfficientNet generally outperforms VGG16, ResNet50, and InceptionV3 in the task of diabetic retinopathy (DR) detection—both in terms of accuracy and efficiency. Here\'s a summary comparison based on recent literature and benchmarks on datasets like APTOS, Messidor, and EyePACS:
References
[1] Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition.
[2] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition.
[3] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., &Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision.
[4] R. E. Sorace, V. S. Reinhardt, and S. A. Vaughn, “High-speed digital-to-RF converter,” U.S. Patent 5 668 842, Sept. 16, 1997.
[5] Tan, M., & Le, Q. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.
[6] Kaggle Diabetic Retinopathy Detection Dataset 1 https://www.kaggle.com/datasets/sovitrath/diabetic-retinopathy-224x224-gaussian-filtered
[7] Kaggle Diabetic Retinopathy Detection Dataset 2 https://www.kaggle.com/datasets/amanneo/diabetic-retinopathy-resized-arranged