A Multi-Modal Deep Learning Framework for AMD Risk Prediction

Authors: Pankaj Borkar, Asawari Dharme, Chandni Sahu, Vaishnavi Mishra, Ritika Kumari

DOI Link: https://doi.org/10.22214/ijraset.2026.79481

Abstract

Despite the significant progress achieved in ophthalmic imaging, Age-Related Macular Degeneration (AMD) continues to be one of the main causes of permanent visual loss in aging persons in the world population. Traditional diagnostic paradigms largely rely on image-centric Computer-Aided Diagnosis (CAD) systems that do not encompass key physiological processes in the system such as hypertension and diabetes which are the key determinants of pathogenesis. The current research proposes a strong Multi-Mode Deep Learning framework that is developed to address the limitations of unimodality by incorporating high-dimensional spatial features of retinal fundus images with the key clinical biomarkers. The architecture uses a ResNet-50 visual feature extractor and a parallel Multi-layer Perceptron (MLP) to process clinical samples and includes a specific Focal Loss criterion to address the problem of class imbalance and, therefore, allow a more global risk assessment. Results of the empirical assessment of the ODIR5K data set show that the method has a classification rate of 96.53 per cent and a sensitivity of 94.1, which proves the superiority of the offered methodology. These results support the efficiency of integrating the systemic health information to address the diagnostic uncertainty at the early AM stage, and therefore, present a more reliable basis of the modern Clinical Decision Support Systems (CDSS).

Introduction

The text discusses a deep learning-based approach for improving the diagnosis and risk prediction of Age-Related Macular Degeneration (AMD), a major cause of irreversible vision loss among elderly populations worldwide. AMD is characterized by the accumulation of drusen deposits in the retina, and its progression is influenced not only by retinal abnormalities but also by systemic conditions such as diabetes and hypertension. Traditional AI-based retinal diagnostic systems mainly rely on retinal images alone, which limits their ability to capture a patient’s complete health profile.

To overcome this limitation, the study proposes a Dual-Stream Multi-Modal Deep Learning Network that combines retinal image analysis with patient clinical data. The system integrates visual features extracted from fundus images with systemic health information such as age, diabetes status, and hypertension history, enabling a more comprehensive and personalized risk assessment.

The research highlights the emergence of multimodal AI in healthcare, emphasizing that AMD progression depends on both retinal morphology and systemic risk factors. Existing state-of-the-art models often use unimodal approaches based only on retinal scans like fundus images or Optical Coherence Tomography (OCT), which may lead to incomplete diagnoses. The proposed model addresses this research gap by fusing heterogeneous data sources.

Key contributions of the research include:

Multi-Modal Fusion: Combining retinal image features with patient clinical data using a late-fusion deep learning architecture.
Class Imbalance Handling: Using Synthetic Minority Over-sampling Technique (SMOTE) and Focal Loss to improve detection of high-risk AMD cases and reduce bias toward healthy samples.

The methodology uses the ODIR-5K retinal dataset, which contains fundus images and clinical metadata. The dataset is divided into low-risk and high-risk AMD classes. Data preprocessing includes:

Retinal image enhancement through cropping, resizing, and CLAHE contrast enhancement.
Clinical data normalization using Min-Max scaling.
Data augmentation techniques such as rotation, flipping, and shifting to improve model robustness.

The proposed architecture has two parallel streams:

Visual Stream: A pre-trained ResNet50 CNN extracts high-level retinal image features.
Clinical Stream: A Multi-Layer Perceptron (MLP) processes patient clinical variables.

The outputs of both streams are fused into a combined feature representation, followed by dense layers and sigmoid activation to predict AMD risk probability. The model uses focal loss to emphasize difficult and minority-class samples, improving sensitivity to early-stage AMD detection.

The system is implemented using Python, TensorFlow, Keras, and OpenCV, and trained on NVIDIA Tesla T4 GPU hardware. Overall, the proposed multimodal framework improves diagnostic accuracy, interpretability, and clinical relevance by combining ocular and systemic patient information, making it more effective than traditional unimodal AI models for AMD screening and risk assessment.

Conclusion

The data from our experiments on the ODIR-5K dataset speaks for itself: • Classification Accuracy: 96.53%. • F1-Score: 0.96. • AUC: 0.98. • Sensitivity: 94.1%. This jump in sensitivity over standard unimodal models (85.2%) is the most vital clinical takeaway. It suggests that our system can identify high-risk patients who might otherwise slip through the cracks of conventional AI tools due to subtle visual symptoms. Furthermore, by utilizing SMOTE and Focal Loss, we overcame the \"Accuracy Paradox,\" ensuring our results weren\'t skewed by the majority class. The addition of Grad-CAM provides a level of transparency that confirms our model is learning pathology rather than just picking up on dataset noise.

References

[1] W. L. Wong et al., \"Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis,\" The Lancet Global Health, vol. 2, no. 2, pp. e106-e116, 2014. [2] D. S. W. Ting et al., \"Deep learning in ophthalmology: The technical and clinical considerations,\" Progress in Retinal and Eye Research, vol. 72, p. 100759, 2019. [3] World Health Organization, World Report on Vision, Geneva: WHO Press, 2023. [4] J. Balyen and H. Peto, \"A deep learning-based model for the detection of age-related macular degeneration,\" in IEEE International Symposium on Biomedical Imaging (ISBI), 2019, pp. 112–115. [5] C. S. Tan et al., \"Systemic risk factors for age-related macular degeneration,\" Survey of Ophthalmology, vol. 61, no. 6, pp. 681–696, 2016. [6] L. A. J. Bastos et al., \"Multi-modal learning for medical imaging: A systematic review,\" IEEE Transactions on Medical Imaging, vol. 40, no. 10, pp. 2345–2360, 2021. [7] ODIR-5K: Ocular Disease Recognition Dataset. [Online]. Available: https://www.kaggle.com/datasets/andrewmvd/ocular-disease-recognition-odir5k. [Accessed: 15-Jan- 2025]. [8] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, \"SMOTE: Synthetic minority over-sampling technique,\" Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. [9] K. He, X. Zhang, S. Ren, and J. Sun, \"Deep residual learning for image recognition,\" in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. [10] N. Srivastava et al., \"Dropout: A simple way to prevent neural networks from overfitting,\" Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014. [11] D. P. Kingma and J. Ba, \"Adam: A method for stochastic optimization,\" in International Conference on Learning Representations (ICLR), 2015. [12] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, \"Focal loss for dense object detection,\" in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988. [13] R. R. Selvaraju et al., \"Grad-CAM: Visual explanations from deep networks via gradient-based localization,\" in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626. [14] H. P. Klein et al., \"The relationship of diabetes mellitus to age-related macular degeneration,\" JAMA Ophthalmology, vol. 126, no. 5, pp. 763–768, 2008. [15] M. A. Khan, S. A. Ali, and J. Smith, \"VisionTrack: A Multi-Modal AI System for Multi-Label Retinal Disease Prediction Using OCT and Fundus Images,\" IEEE Sensors Journal, vol. 25, no. 14, pp. 4492–4501, 2025. [16] S. Liu et al., \"RetStroke: Multimodal Deep Learning for Stroke and AMD Prediction using Retinal Imaging and Clinical Data,\" IEEE Transactions on Medical Imaging, vol. 44, no. 3, pp. 120– 135, 2025. [17] J. Hemmatian, R. Hajizadeh, and F. Nazari, \"Addressing Imbalanced Data Classification with Cluster-Based Reduced Noise SMOTE for Medical Imaging,\" PLOS ONE, vol. 20, no. 2, p. e0317396, Feb. 2025. [18] Y. Zhang and L. Wang, \"Deep Learning-Based Solution to the Class Imbalance Problem in High- Resolution Medical Classification,\" Remote Sensing in Medicine, vol. 17, no. 11, p. 1845, 2025. [19] A. Gupta and R. Kumar, \"Recent Advances in the Application of Artificial Intelligence in Age- Related Macular Degeneration: A 2025 Update,\" BMJ Open Ophthalmology, vol. 9, no. 1, p. e001903, 2025. [20] R. Verma, \"Next-Gen Vision Transformers for Early Detection of Retinal Degenerative Diseases,\" Journal of Biomedical Informatics, vol. 142, p. 104381, 2025. [21] T. Nguyen and P. Le, \"Explainable AI (XAI) for Clinical Trust: Visualizing Deep Learning Decisions in Ophthalmology,\" Artificial Intelligence in Medicine, vol. 148, p. 102755, 2025. [22] E. Crincoli, R. Sacconi, L. Querques, and G. Querques, \"Artificial Intelligence in Age-Related Macular Degeneration: State of the Art and Recent Updates,\" BMC Ophthalmology, vol. 24, no. 121, pp. 1–12, 2024.

Copyright

Copyright © 2026 Pankaj Borkar, Asawari Dharme, Chandni Sahu, Vaishnavi Mishra, Ritika Kumari. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET79481

Publish Date : 2026-04-05

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here