An Explainable Cross-Modal Transformer Framework for Brain Tumor Detection and Classification

Authors: Sahukari Abhisheak, Randhi Aditya Durgarao, Andraju Chandra Sekhar, Sangani Jagadesh Kumar

DOI Link: https://doi.org/10.22214/ijraset.2026.80857

Abstract

Hard to spot brain tumors clearly on MRI scans. Many look different even in the same type. Not enough labeled examples exist for training. Doctors also want to understand how systems reach decisions. This study introduces a new method called ESCCMT. It uses deep learning to find and sort tumors without heavy reliance on labels. Learning happens by predicting missing parts in scans. That way it learns patterns using tons of raw images. Instead of only supervised methods, it builds knowledge first on its own. The design mixes CNNs with Transformers. One handles fine details in small areas. The other connects wider context across scan types. Works across T1, T1ce, T2, and FLAIR images at once. Together they see both edges and broader structure. Model behavior stays visible through explanation layers. Fusing different types of data works better when cross-attention steps in - it shapes how signals meet and mix. Instead of guessing why decisions happen, this system shows where the focus lands, using visuals tied to attention patterns along with maps built from gradients. Picture confidence levels getting tested through repeated dropouts during runs - that’s how certainty gets measured here. On standard test sets, outcomes stand out clearly: accuracy climbs, precision sharpens, stability strengthens. While older CNNs lag behind, so do basic transformers without extra support layers. What stands apart isn’t just correct answers more often - it\'s knowing which areas tipped the scale each time. Seeing both result and reason at once opens doors for actual hospital use when tumors need spotting.

Introduction

The text explains a deep learning-based system for brain tumor detection using MRI scans, focusing on improving accuracy, interpretability, and clinical usability.

It begins by highlighting that brain tumors are life-threatening if not detected early, and while MRI imaging is effective for diagnosis, manual interpretation is slow, inconsistent, and difficult when dealing with large or complex datasets. This motivates the need for automated AI-based diagnostic systems.

Traditional machine learning methods struggle with complex medical images, while CNNs (Convolutional Neural Networks) improve feature extraction but mainly focus on local image patterns and fail to capture long-range relationships. To overcome this, the study introduces Transformers, which use attention mechanisms to understand global relationships across MRI images.

The proposed system, called ESCCMT, combines CNN and Transformer architectures to improve brain tumor classification. It uses multiple MRI modalities—T1, T1ce, T2, and FLAIR—and integrates them using attention-based fusion. The system is trained using both labeled and unlabeled data, improving learning through self-supervised techniques. It also enhances transparency by providing explanations (heatmaps) and confidence/uncertainty scores along with predictions.

The literature review shows the evolution from traditional ML models (like SVM and Logistic Regression) to CNN-based methods and more recent hybrid models involving YOLO and ensemble techniques. However, challenges remain such as limited datasets, high computational cost, poor multi-modal integration, and lack of explainability.

The system architecture and methodology describe a multi-stage pipeline:

MRI data collection (T1, T2, T1ce, FLAIR)
Preprocessing (resizing, noise removal, normalization, augmentation)
Feature extraction using CNN
Global context learning using Transformers
Multi-modal fusion using attention mechanisms
Classification into tumor types (glioma, meningioma, pituitary, or no tumor)
Output includes prediction, confidence score, and visual explanation (heatmaps)

Overall, the system aims to create a more accurate, interpretable, and clinically useful brain tumor detection model by combining CNNs and Transformers with multi-modal MRI fusion and explainability tools, making it more suitable for real-world medical diagnosis.

Conclusion

A mix of CNNs and Transformers formed the core of this method for spotting brain tumors in MRI scans. Instead of working separately, these parts share insights across image types to pick up fine details plus broader patterns. What stands out is how different scan modes feed into one another, making sense of complex data without losing clarity. Clearer decisions come through built-in explanations and checks on prediction confidence. Tests showed it works fast and gets correct answers often. Even under varied conditions, the system holds steady, offering trustworthy outputs that doctors could rely on later down the line.

References

[1] T. A. Fahim et al., “Brain Tumor Detection, Classification and Segmentation Using Deep Learning Models,” Biomedical Signal Processing and Control, 2025. [2] A. Abdusalomov et al., “Brain Tumor Detection Based on Deep Learning Using YOLOv7,” Scientific Reports, 2023. [3] J. Yang et al., “BrainCNN: Automated Brain Tumor Grading Using Deep Learning,” Biomedical Engineering Advances, 2025. [4] M. Rahman et al., “Deep Learning-Based Brain Tumor Classification Using MRI Images,” Expert Systems with Applications, 2025. [5] S. Kumar et al., “Ensemble Deep Learning Models for Brain Tumor Detection,” Scientific Reports, 2025. [6] R. Sharma et al., “Explainable AI for Brain Tumor Classification Using MRI,” Applied Sciences, 2024. [7] Y. Li et al., “Real-Time Brain Tumor Detection Using YOLOv8,” Computers in Biology and Medicine, 2025. [8] Kaggle, “Brain Tumor MRI Dataset,” [Online]. Available: https://www.kaggle.com [9] PyTorch Team, “PyTorch: An Open-Source Machine Learning Framework,” [online]. Available: https://pytorch.org [10] OpenCV, “Open-Source Computer Vision Library,” [Online]. Available: https://opencv.org

Copyright

Copyright © 2026 Sahukari Abhisheak, Randhi Aditya Durgarao, Andraju Chandra Sekhar, Sangani Jagadesh Kumar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET80857

Publish Date : 2026-04-23

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here