Cardiovascular disease (CVD) remains the leading cause of mortality globally, responsible for approximately 17.9 million deaths annually. Conventional diagnostic approaches are constrained by inter-observer variability, shortages of specialists, and limited accessibility in resource-constrained environments. This systematic review evaluates the efficacy, design, and clinical implementation of deep learning (DL) models for CVD detection across diverse imaging and signal modalities. In accordance with PRISMA 2020 guidelines, we performed a comprehensive literature search in PubMed, IEEE Xplore, Scopus, and Web of Science for studies published between January 2015 and October 2025. Eligible studies employed deep neural network architectures for CVD detection and reported quantitative diagnostic performance metrics. Methodological quality and reporting transparency were assessed using the QUADAS-2 and CLAIM frameworks. Of the 127 studies included, convolutional neural networks were most frequently utilized (64.6%), followed by hybrid and recurrent models. Reported diagnostic accuracy ranged from 87.3% to 99.2%, with electrocardiogram-based arrhythmia detection achieving a mean accuracy of 97.1%. These findings underscore the considerable potential of DL-based systems for automated cardiovascular diagnosis. Nevertheless, only 23.6% of studies conducted prospective clinical validation, and 73.2% did not report race or ethnicity, raising concerns regarding generalizability, bias, and fairness. Although deep learning demonstrates high diagnostic performance in controlled research settings, substantial gaps persist in real-world validation, equity assessment, and clinical adoption. Future investigations should prioritize large-scale multicenter prospective studies, standardized fairness assessments, transparent reporting, and clearer regulatory guidance to facilitate the safe and effective integration of DL-based cardiovascular diagnostic tools into clinical practice.
Introduction
Cardiovascular diseases (CVD) are the leading cause of global mortality, with significant economic and public health burdens. Traditional diagnostics, such as ECG and echocardiography, face limitations due to observer variability, training requirements, and a shortage of specialists, highlighting the need for automated, AI-driven diagnostic systems.
Deep Learning in Cardiology:
Deep neural networks (DNNs), particularly CNNs, RNNs, transformers, and hybrid models, have demonstrated high accuracy in ECG interpretation, echocardiography, cardiac MRI, and CT. Hybrid architectures (e.g., CNN–LSTM) show improved performance for temporal or multimodal data. Transfer learning further enhances accuracy, especially with limited datasets.
Datasets:
ECG datasets dominate research due to availability and standardization, with MIT-BIH and PhysioNet being the most used. Imaging datasets like UK Biobank and EchoNet-Dynamic are less common, with median sample sizes of ~5,673.
Other tasks (MI diagnosis, atrial fibrillation, LV dysfunction, valvular disease, myocardial scar) achieved 89–95% accuracy depending on modality and architecture.
Validation and Clinical Translation:
Only 24% of studies conducted prospective validation, and fewer than 8% performed randomized clinical trials.
Many studies lacked external validation, reproducibility (e.g., public code/datasets), and demographic stratification, raising concerns about fairness and bias. Performance disparities were observed across age, sex, and ethnicity.
Deployment barriers include regulatory approval, infrastructure needs, workflow integration, and physician trust.
Interpretability:
~46% of studies used explainable AI techniques (Grad-CAM, attention visualization, SHAP), but post-hoc explanations may not fully capture model reasoning.
Future Directions:
Emerging opportunities include federated learning, continual learning, multimodal integration, and foundation models to improve accuracy, generalizability, and privacy preservation.
Limitations:
Most studies were retrospective, single-center, and dataset-limited. Study quality varied, with lower-quality studies often reporting inflated performance, suggesting publication bias.
Key Takeaway:
Deep learning shows strong potential to augment cardiovascular diagnostics, particularly in ECG-based tasks, but clinical translation requires addressing validation rigor, fairness, reproducibility, and deployment challenges.
Conclusion
This systematic review of 127 studies demonstrates that deep learning achieves high diagnostic accuracy (87.3-99.2%) across various cardiac pathologies and imaging modalities. Arrhythmia detection from ECG achieved the highest performance, with a mean accuracy of 97.1% and an AUC-ROC of 0.985, closely matching expert cardiologist performance [32]. Nonetheless, significant gaps persist in prospective validation, demographic equity, and clinical translation readiness.
References
[1] World Health Organization. \"Cardiovascular diseases (CVDs),\" WHO Fact Sheet, 2021. Available: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
[2] C. W. Tsao, et al., \"Heart disease and stroke statistics—2022 update,\" Circulation, vol. 145, no. 8, pp. e153–e639, 2022.
[3] J. Schläpfer and H. J. Wellens, \"Computer-interpreted electrocardiograms: Benefits and limitations,\" J. Am. Coll. Cardiol., vol. 70, no. 9, pp. 1183–1192, 2017.
[4] J. S. Gottdiener, et al., \"American Society of Echocardiography recommendations for use of echocardiography in clinical trials,\" J. Am. Soc. Echocardiogr., vol. 17, no. 10, pp. 1086–1119, 2004.
[5] R. Khatib, et al., \"Availability and affordability of cardiovascular disease medicines,\" Lancet, vol. 387, no. 10013, pp. 61–69, 2016.
[6] Y. LeCun, Y. Bengio, and G. Hinton, \"Deep learning,\" Nat. Rev., vol. 521, no. 7553, pp. 436–444, 2015.
[7] G. Litjens, et al., \"A survey on deep learning in medical image analysis,\" Med. Image Anal., vol. 42, pp. 60–88, 2017.
[8] A. Esteva, et al., \"A guide to deep learning in healthcare,\" Nat. Med., vol. 25, no. 1, pp. 24–29, 2019.
[9] A. Esteva, et al., \"Dermatologist-level classification of skin cancer with deep neural networks,\" Nature, vol. 542, no. 7639, pp. 115–118, 2017.
[10] D. Dey, et al., \"Artificial intelligence in cardiovascular imaging,\" J. Am. Coll. Cardiol., vol. 73, no. 11, pp. 1317–1335, 2019.
[11] M. J. Page, et al., \"The PRISMA 2020 statement: An updated guideline for reporting systematic reviews,\" BMJ, vol. 372, p. n71, 2021.
[12] M. L. McHugh, \"Interrater reliability: The kappa statistic,\" Biochem. Med., vol. 22, no. 3, pp. 276–282, 2012.
[13] P. F. Whiting, et al., \"QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies,\" Ann. Intern. Med., vol. 155, no. 8, pp. 529–536, 2011.
[14] J. Mongan, L. Moy, and C. E. Kahn Jr., \"Checklist for Artificial Intelligence in Medical Imaging (CLAIM),\" Radiol. Artif. Intell., vol. 2, no. 2, p. e200029, 2020.
[15] N. Strodthoff, P. Wagner, T. Schaeffter, and W. Samek, \"Deep learning for ECG analysis: Benchmarks and insights from PTB-XL,\" IEEE J. Biomed. Health Inform., vol. 25, no. 5, pp. 1519–1528, 2021.
[16] S. Hong, et al., \"Opportunities and challenges of deep learning methods for electrocardiogram data,\" Comput. Biol. Med., vol. 122, p. 103801, 2020.
[17] A. H. Ribeiro, et al., \"Automatic diagnosis of the 12-lead ECG using a deep neural network,\" Nat. Commun., vol. 11, no. 1, p. 1760, 2020.
[18] B. Ruijsink, et al., \"Fully automated, quality-controlled cardiac analysis from CMR,\" JACC Cardiovasc. Imaging, vol. 13, no. 3, pp. 684–695, 2020.
[19] A. Y. Hannun, et al., \"Cardiologist-level arrhythmia detection in ambulatory electrocardiograms,\" Nat. Med., vol. 25, no. 1, pp. 65–69, 2019.
[20] M. Raghu, et al., \"Transfusion: Understanding transfer learning for medical imaging,\" in Advances in Neural Information Processing Systems, 2019, pp. 3347–3357.
[21] G. Quer, R. Arnaout, M. Henne, and R. Arnaout, \"Machine learning and the future of cardiovascular care,\" J. Am. Coll. Cardiol., vol. 77, no. 3, pp. 300–313, 2021.
[22] R. Norgeot, et al., \"Minimum information about clinical artificial intelligence modeling,\" Nat. Med., vol. 26, no. 9, pp. 1320–1324, 2020.
[23] A. Gichoya, et al., \"AI recognition of patient race in medical imaging: A modelling study,\" Lancet Digit. Health, vol. 4, no. 6, p. e406–e414, 2022.
[24] D. A. Vyas, L. G. Eisenstein, and D. S. Jones, \"Hidden in plain sight—Reconsidering race correction,\" N. Engl. J. Med., vol. 383, no. 9, pp. 874–882, 2020.
[25] K. J. Kwon, et al., \"Deep learning for predicting in-hospital mortality,\" Echocardiography, vol. 36, no. 2, pp. 213–218, 2019.
[26] Z. Obermeyer, et al., \"Dissecting racial bias in an algorithm,\" Science, vol. 366, no. 6464, pp. 447–453, 2019.
[27] A. I. Banerjee, et al., \"Disparities in cardiovascular care delivery and outcomes,\" Circ. Cardiovasc. Qual. Outcomes, vol. 15, no. 1, p. e008491, 2022.
[28] R. R. Selvaraju, et al., \"Grad-CAM: Visual explanations from deep networks,\" in IEEE International Conference on Computer Vision, 2017, pp. 618–626.
[29] E. Tjoa and C. Guan, \"A survey on explainable artificial intelligence,\" IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 11, pp. 4793–4813, 2021.
[30] N. Rieke, et al., \"The future of digital health with federated learning,\" NPJ Digit. Med., vol. 3, p. 119, 2020.
[31] G. Quer, et al., \"Screening for cardiac contractile dysfunction using AI-enabled ECG,\" Nat. Med., vol. 27, no. 1, pp. 70–74, 2021.