More than seven million people annually die due to strokes, 87% of which are ischemic strokes. There are two categories of ischemic strokes that necessitate different treatments, yet there is currently no accurate way of distinguishing these two groups except through the established clinical methodology TOAST with accuracy lower than 60%. The only possible way of proper classification by etiological origin of ischemic stroke requires histological analysis of thrombi extracted with the help of mechanical thrombectomy which takes a lot of time (2-4 hours per slide). Inter-expert reproducibility of the diagnosis made with such technique is not very high (Cohen\'s ? ~0.34). In this paper, we present an end-to-end fully-automated system relying on deep learning technology working with gigapixel Whole Slide Images and outputting binary predictions (either CE or LAA) along with attention map visualizations. Our system extracts sixteen patches with the highest tissue density per slide, encodes each patch with the help of two pre-trained pathology models – Virchow2 [12] (1280-d) and UNI2-H [13] (1536-d) and further utilizes two-stage hierarchical attention mechanism to aggregate importance of slides and tiles separately into one vector. The last step in the process of classification involves the use of the XGBoost classifier where the parameters are tuned through Optuna. All training and test procedures were conducted through the Mayo Clinic STRIP AI data (1,154 WSIs) in a 6-fold cross-validation scheme stratified by clinical center. In addition to this, there is a system for pathologist review and clinical report included in the application process. The experimental findings reveal that the suggested dual foundation model strategy outperforms all single models, while the stratification method provides a good measure for generalization..
Introduction
This paper presents a computational pathology system for classifying stroke etiology (Cardioembolic vs Large Artery Atherosclerosis) using whole-slide thrombus histology images, aiming to improve diagnostic accuracy beyond existing clinical and imaging-based methods.
The core problem is that current stroke classification systems like TOAST are often unreliable (~40% misclassification), yet correct classification is critical because CE and LAA strokes require completely different treatments, and wrong treatment can be dangerous.
To address this, the authors propose a fully integrated deep-learning + machine learning pipeline for analyzing whole-slide images (WSIs) of blood clots obtained after thrombectomy.
Key idea of the method
The system processes extremely large histopathology slides through a multi-stage pipeline:
Tissue-aware tile selection extracts the most informative regions from massive WSIs.
Two pretrained vision transformer models (Virchow2 and UNI2-H) generate rich feature embeddings from image patches.
A two-stage hierarchical attention model identifies which tiles and regions are most important for diagnosis.
The features are aggregated into a single patient-level representation.
An XGBoost classifier predicts whether the stroke is CE or LAA.
The system also generates visual explanations and clinical reports for pathologists.
Dataset and training
Uses the STRIP AI Mayo Clinic dataset (1,154 WSIs).
Only slide-level labels are available (no pixel annotations).
Training uses cross-validation with careful patient-level separation to avoid data leakage.
Results
AUC: ~0.67
Accuracy: ~71.7%
Performance varies across folds due to differences in imaging centers and staining methods.
CE is detected better than LAA, while LAA classification is weaker (~59% recall).
Key contributions
Intelligent tissue-based tile selection
Use of dual pretrained transformer backbones
Hierarchical attention pooling without pixel-level labels
Optimized XGBoost classifier with tuned thresholds (Youden’s J)
A full clinical deployment and reporting system with explainability
Conclusion
An end-to-end workflow for automated stroke etiology classification from histopathological WSIs has been presented, with a clear emphasis on considerations for deployment in the clinic which research-driven studies often overlook. Our model attains mean out-of-fold log loss of 0.549, ROC-AUC of 0.672, and LAA F1 of 0.550 under a strict centre-stratified cross-validation framework on 754 cases—an evaluation protocol that directly probes cross-site generalisability and avoids within-site memorization. The design choices of tissue-centric tile selection, dual-backbone embedding using Virchow2 [12] and UNI2-H [13], multi-scale hierarchical pooling, and Youden’s J threshold tuning each tackle a specific failure case of less sophisticated models.
Four shortcomings must be mentioned forthrightly. At 754 cases, the dataset size is modest compared to present deep-learning paradigms, and the AUC variance between folds (0.576–0.729) is an immediate effect thereof—the estimation uncertainty is genuine and cannot be ignored. LAA The sensitivity of 59.4% outperforms the TOAST baseline but remains far from the level required for autonomous use in the clinic; the pathologist review component of Module 5 is essential, not optional. With a Youden index of 0.278, we can see that inverse frequency class weighting has helped, but not solved, the imbalance between the two classes at 72.5% and 27.5%, respectively; cost-sensitive goals inherent to XGBoost should be considered. Most importantly, however, there is no prospectively validated clinical application to date – this is the next step.
Among avenues that could prove fruitful: conducting ablation studies to measure each backbone\'s contribution to the performance separately; broadening the scope of the classification task to include multiple ischemic types and the \"undetermined\" type as well; incorporating relevant additional data sources, such as neuroimages and admission biomarkers; calibrating the model to specific sites in order to solve the generalisation problem posed by the performance on Fold 3; model compression for use without GPUs in hospitals; and, ultimately, active learning systems based on pathologist review annotation
References
[1] W. H. Chang et al., \"Validation of the TOAST classification in ischemic stroke subtypes,\" Cerebrovasc. Dis., vol. 47, pp. 113–119, 2019.
[2] S. Arboix and J. Alioc, \"Cardioembolic stroke: clinical features and prognosis,\" Curr. Cardiol. Rev., vol. 6, no. 3, pp. 150–161, Aug. 2010.
[3] C. Maier et al., \"Clot composition analysis by histology predicts stroke aetiology,\" J. Neurol. Neurosurg. Psychiatry, vol. 91, pp. 1050–1057, Oct. 2020.
[4] T. Liebeskind et al., \"Pathology of clot retrieved in stroke thrombectomy,\" Neurology, vol. 95, pp. e2774–e2781, Nov. 2020.
[5] David Azatyan. (2023). Image Classification of Stroke Blood Clot Origin using Deep Convolutional Neural Networks and Visual Transformers. arXiv preprint arXiv:2305.16492.
[6] Kun-Hao Yeh, Mohamed Sobhi Jabal, Vikash Gupta et al. (2024). Transformer-Based Self-Supervised Learning for Histopathological Classification of Ischemic Stroke Clot Origin. arXiv preprint arXiv:2405.00908.
[7] Wi-Sun Ryu, Dawid Schellingerhout, Hoyoun Lee et al. (2024). Deep Learning-Based Automatic Classification of Ischemic Stroke Subtype Using Diffusion-Weighted Images. Journal of Stroke. https://doi.org/10.5853/jos.2024.00535
[8] Ekingen E, Yildirim F, Bayar O et al. (2025). StrokeNeXt: an automated stroke classification model using lightweight CNN. PubMed Central, PMC12142900.
[9] Álvaro Lucero-Garófano, Alicia Aliena-Valero, Isabel Vielba-Gómez et al. (2025). Automatic etiological classification of stroke thrombus digital photographs using a deep learning model. Frontiers in Neurology. https://doi.org/10.3389/fneur.2025.1534845
[10] Mara Pleasure, Ekaterina Redekop, Jennifer S. Polson et al.. (2023). Pathology-Based Ischemic Stroke Etiology Classification via Clot Composition Guided Multiple Instance Learning. ICCVW 2023 Workshop Paper.
[11] Ashley Chow et al. Mayo Clinic - STRIP AI, 2022. Kaggle..
[12] E. Zimmermann et al., \"Virchow2: Scaling self-supervised mixed magnification models in pathology,\" arXiv preprint arXiv:2408.00738, Aug. 2024.
[13] H. Chen et al., \"Towards a general-purpose foundation model for computational pathology,\" Nature Medicine, vol. 30, pp. 850–862, Mar. 2024. doi: 10.1038/s41591-024-02857-3.