A Unified Taxonomy of CNN, Transformer, and Explainable AI Models for Brain Tumour Segmentation

Authors: Dhanashree P M Kuthe, Dr. Sanjay Kumar

DOI Link: https://doi.org/10.22214/ijraset.2026.78475

Abstract

Brain tumour segmentation using Magnetic Resonance Imaging (MRI) has emerged as a critical application of deep learning in medical imaging. While convolutional neural networks (CNNs) and U-Net-based architectures have demonstrated high accuracy, their black-box nature limits clinical adoption. This paper presents an extensive and critical review of more than 100 research contributions in brain tumour segmentation, focusing on model architectures, explainability techniques, evaluation strategies, and clinical applicability. A structured taxonomy, detailed comparative analysis, mathematical modeling, and research gap identification are provided. The review emphasizes the integration of explainable artificial intelligence (XAI) techniques such as Grad-CAM and highlights future directions for developing robust, interpretable, and clinically viable systems.

Introduction

The text provides a comprehensive review of deep learning approaches for brain tumour segmentation using MRI, highlighting the evolution of models, their strengths, limitations, and the role of explainable AI (XAI).

Key Points:

Importance of Brain Tumour Segmentation:
- Essential for diagnosis, treatment planning, and disease monitoring.
- MRI is preferred for its soft tissue contrast and multi-modal imaging capabilities.
Limitations of Traditional Methods:
- Handcrafted feature-based methods are labor-intensive, less scalable, and poorly generalizable.
- Deep learning (CNNs) enables automated feature extraction and end-to-end learning.
Systematic Review:
- Papers from 2015–2025 focusing on MRI segmentation, deep learning, and explainability were analyzed.
- Categorized into CNN-based models, U-Net variants, transformer-based architectures, and XAI methods.
Methodologies and Models:
- CNN-Based Methods: Efficient low-level feature extraction but limited global context and poor boundary delineation.
- U-Net & Variants: Encoder-decoder architecture with skip connections; preserves spatial information; limitations include memory intensity and local context dependence.
- Attention Mechanisms: Improve feature focus and interpretability; increased computation cost.
- Transformer-Based Models: Capture global dependencies; require large datasets and high computation.
- Hybrid CNN–Transformer Models (e.g., TransUNet, Swin-UNet): Integrate local and global features; state-of-the-art performance but complex and expensive to train.
- XAI-Integrated Methods (e.g., Grad-CAM): Enhance interpretability and clinician trust; mostly post-hoc, coarse localization.

Comparative Analysis:

Category	Context Awareness	Accuracy	Complexity	Explainability
CNN	Low	Moderate	Low	Low
U-Net	Medium	High	Medium	Low
Attention	Medium-High	High	Medium-High	Medium
Transformer	High	Very High	High	Low
Hybrid	Very High	Highest	Very High	Medium
XAI	Depends	High	High	High

Research Gaps:
- Lack of integrated explainable architectures.
- High computational complexity of advanced models.
- Limited generalization across datasets.
- No standardized evaluation framework for benchmarking.

References

[1] O. Ronneberger, P. Fischer, and T. Brox, \"U-Net: Convolutional Networks for Biomedical Image Segmentation,\" MICCAI, 2015. [2] F. Milletari, N. Navab, and S.-A. Ahmadi, \"V-Net: Fully Convolutional Neural Networks for Volumetric Segmentation,\" 3DV, 2016. [3] K. Kamnitsas et al., \"Efficient multi-scale 3D CNN with fully connected CRF,\" MedIA, 2017. [4] K. He et al., \"Deep Residual Learning for Image Recognition,\" CVPR, 2016. [5] G. Hinton et al., \"Deep Neural Networks for Acoustic Modeling,\" IEEE SPM, 2012. [6] A. Krizhevsky et al., \"ImageNet Classification with Deep CNNs,\" NeurIPS, 2012. [7] J. Long et al., \"Fully Convolutional Networks for Semantic Segmentation,\" CVPR, 2015. [8] V. Badrinarayanan et al., \"SegNet,\" IEEE TPAMI, 2017. [9] O. Oktay et al., \"Attention U-Net,\" arXiv, 2018. [10] Z. Zhou et al., \"UNet++,\" IEEE TMI, 2018. [11] Ö. Çiçek et al., \"3D U-Net,\" MICCAI, 2016. [12] F. Isensee et al., \"nnU-Net,\" Nat. Methods, 2021. [13] J. Chen et al., \"TransUNet,\" arXiv, 2021. [14] A. Hatamizadeh et al., \"UNETR,\" WACV, 2022. [15] A. Dosovitskiy et al., \"Vision Transformer,\" ICLR, 2021. [16] R. Selvaraju et al., \"Grad-CAM,\" ICCV, 2017. [17] A. Chattopadhyay et al., \"Grad-CAM++,\" WACV, 2018. [18] M. Ribeiro et al., \"LIME,\" KDD, 2016. [19] S. Lundberg and S. Lee, \"SHAP,\" NeurIPS, 2017. [20] G. Huang et al., \"DenseNet,\" CVPR, 2017. [21] S. Pereira et al., \"Brain Tumour Segmentation using CNNs,\" IEEE TMI, 2016. [22] M. Havaei et al., \"Brain tumour segmentation with Deep Neural Networks,\" MedIA, 2017. [23] K. Kamnitsas et al., \"DeepMedic,\" MedIA, 2017. [24] A. Myronenko, \"3D MRI brain tumour segmentation using autoencoder,\" MICCAI, 2018. [25] G. Wang et al., \"Cascaded CNN,\" IEEE TMI, 2017. [26] H. Zhao et al., \"Multi-scale attention network,\" CVPR, 2018. [27] X. Li et al., \"Hybrid loss for segmentation,\" IEEE Access, 2019. [28] Y. Chen et al., \"3D Attention CNN,\" Neurocomputing, 2019. [29] Y. Zhang et al., \"Deep supervision,\" IEEE Access, 2018. [30] H. Kervadec et al., \"Boundary loss,\" MIDL, 2019. [31] R. Selvaraju et al., \"Grad-CAM,\" ICCV, 2017. [32] D. Smilkov et al., \"SmoothGrad,\" ICML Workshop, 2017. [33] M. Ribeiro et al., \"Why Should I Trust You?,\" KDD, 2016. [34] S. Lundberg et al., \"Explainable AI with SHAP,\" Nat. Mach. Intell., 2020. [35] K. Simonyan et al., \"Deep Inside CNNs,\" ICLR, 2014. [36] B. Zhou et al., \"CAM,\" CVPR, 2016. [37] M. Abadi et al., \"TensorFlow,\" OSDI, 2016. [38] A. Paszke et al., \"PyTorch,\" NeurIPS, 2019. [39] I. Goodfellow et al., \"Deep Learning,\" MIT Press, 2016. [40] Y. LeCun et al., \"Deep Learning,\" Nature, 2015. [41] D. Kingma and J. Ba, \"Adam,\" ICLR, 2015. [42] T. Tieleman and G. Hinton, \"RMSProp,\" 2012. [43] S. Ioffe and C. Szegedy, \"Batch Normalization,\" ICML, 2015. [44] K. He et al., \"Delving Deep into Rectifiers,\" ICCV, 2015. [45] N. Srivastava et al., \"Dropout,\" JMLR, 2014. [46] J. Redmon et al., \"YOLO,\" CVPR, 2016. [47] R. Girshick et al., \"Faster R-CNN,\" ICCV, 2015. [48] T. Lin et al., \"FPN,\" CVPR, 2017. [49] K. Simonyan and A. Zisserman, \"VGGNet,\" ICLR, 2015. [50] A. Chattopadhyay et al., \"Grad-CAM++,\" WACV, 2018. [51] L. Perez and J. Wang, \"Data Augmentation,\" arXiv, 2017. [52] N. Ibtehaz and M. Rahman, \"MultiResUNet,\" Neural Networks, 2020. [53] A. Jha et al., \"Double U-Net,\" IEEE ISBI, 2020. [54] A. Jha et al., \"Double U-Net,\" IEEE ISBI, 2020. [55] M. Drozdzal et al., \"Residual U-Net,\" DLMIA, 2016. [56] A. Hatamizadeh et al., \"UNETR,\" WACV, 2022. [57] J. Chen et al., \"TransUNet,\" arXiv, 2021. [58] A. Dosovitskiy et al., \"ViT,\" ICLR, 2021. [59] L. Liu et al., \"Swin Transformer,\" ICCV, 2021. [60] Y. Cao et al., \"Swin-Unet,\" ECCV Workshops, 2022. [61] E. Xie et al., \"SegFormer,\" NeurIPS, 2021. [62] J. Valanarasu et al., \"MedT,\" MICCAI, 2021. [63] D. Smilkov et al., \"SmoothGrad,\" ICML Workshop, 2017. [64] M. Sundararajan et al., \"Integrated Gradients,\" ICML, 2017. [65] B. Menze et al., \"BRATS Benchmark,\" IEEE TMI, 2015. [66] S. Bakas et al., \"BRATS Dataset,\" IEEE TMI, 2018. [67] A. Esteva et al., \"Dermatologist-level classification,\" Nature, 2017. [68] E. Topol, \"High-performance medicine,\" Nat. Med., 2019. [69] Z. Zhou et al., \"Deep supervision,\" IEEE TMI, 2018. [70] F. Yu and V. Koltun, \"Dilated convolutions,\" ICLR, 2016. [71] Z. Gu et al., \"CE-Net,\" IEEE TMI, 2019. [72] O. Oktay et al., \"Attention mechanisms,\" arXiv, 2018. [73] Y. Zhang et al., \"Hybrid loss,\" IEEE Access, 2019. [74] W. Bai et al., \"Semi-supervised learning,\" MICCAI, 2017. [75] C. Perone et al., \"Unsupervised domain adaptation,\" arXiv, 2019. [76] L. Chen et al., \"Consistency learning,\" CVPR, 2020. [77] G. Wang et al., \"Ensemble CNN,\" IEEE TMI, 2019. [78] K. Kamnitsas et al., \"Ensemble 3D CNN,\" MedIA, 2017. [79] F. Isensee et al., \"nnU-Net ensemble,\" Nat. Methods, 2021. [80] C. Shorten and T. Khoshgoftaar, \"Data augmentation survey,\" JBI, 2019. [81] L. Perez and J. Wang, \"Augmentation,\" arXiv, 2017. [82] J. Zech et al., \"Domain generalization,\" PLoS Med., 2018. [83] B. Recht et al., \"ImageNet generalization,\" ICML, 2019. [84] C. Kelly et al., \"Key challenges for AI,\" NPJ Digit. Med., 2019. [85] J. Wiens et al., \"Ethical AI,\" Nat. Med., 2019. [86] H. Greenspan et al., \"Guest Editorial Deep Learning in Medical Imaging,\" IEEE TMI, 2016. [87] D. Shen et al., \"Deep Learning in Medical Image Analysis,\" Annu. Rev. Biomed. Eng., 2017. [88] G. Litjens et al., \"Survey on Deep Learning in Medical Imaging,\" MedIA, 2017. [89] K. Suzuki, \"Overview of Deep Learning in Medical Imaging,\" Radiol. Phys. Technol., 2017. [90] A. Ker et al., \"Deep Learning Applications,\" IEEE Access, 2018. [91] M. Hesamian et al., \"Deep Learning Techniques,\" JDI, 2019. [92] S. Minaee et al., \"Image Segmentation Survey,\" IEEE TPAMI, 2021. [93] Y. LeCun et al., \"Gradient-based learning,\" Proc. IEEE, 1998. [94] T. Cover and P. Hart, \"Nearest neighbor,\" IEEE TIT, 1967. [95] V. Vapnik, \"Statistical Learning Theory,\" Wiley, 1998. [96] L. Breiman, \"Random Forests,\" ML, 2001. [97] J. Friedman, \"Gradient Boosting,\" Ann. Stat., 2001. [98] C. Bishop, \"Pattern Recognition,\" Springer, 2006. [99] I. Goodfellow et al., \"GANs,\" NeurIPS, 2014. [100] A. Radford et al., \"GAN improvements,\" ICLR, 2016.

Copyright

Copyright © 2026 Dhanashree P M Kuthe, Dr. Sanjay Kumar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78475

Publish Date : 2026-03-18

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here