Uterine cancer, particularly endometrial carcinoma, is among the most prevalent gynecological malignancies affecting women globally. Early-stage diagnosis significantly improves treatment outcomes; however, much of the relevant clinical evidence exists in unstructured textual formats such as pathology reports, radiology interpretations, and physician notes. These narratives often contain critical yet underutilized diagnostic information.
This research investigates the application of Natural Language Processing (NLP) techniques for identifying uterine cancer from clinical text data. The study explores a progression of methods, ranging from traditional machine learning approaches to advanced transformer-based architectures tailored for biomedical language. Key aspects such as domain adaptation, model interpretability, and integration into clinical workflows are examined. Furthermore, the paper highlights existing limitations and outlines future research directions for advancing this domain.
Introduction
This study proposes a specialized NLP-based framework for detecting uterine cancer from clinical text, addressing the challenge that important diagnostic information is often stored in unstructured medical documents rather than structured databases. Since early detection of uterine cancer, particularly endometrial carcinoma, significantly improves patient outcomes, the framework focuses on extracting clinically relevant information from medical narratives such as outpatient records, pathology reports, radiology interpretations, surgical notes, and hormonal history documents.
The research highlights that clinical text contains valuable diagnostic indicators, including tumor grade, invasion depth, histological characteristics, and disease stage. However, processing medical text is challenging due to abbreviations, ambiguous terminology, shorthand notations, negation expressions, and institution-specific documentation practices. To overcome these issues, the framework incorporates specialized preprocessing techniques such as medical tokenization, abbreviation expansion, negation detection, Named Entity Recognition (NER), and linkage to medical ontologies like UMLS and ICD.
The study reviews the evolution of NLP methods in healthcare. Traditional approaches such as Bag-of-Words, TF-IDF, Support Vector Machines (SVM), and Naïve Bayes offered interpretable but limited semantic understanding. Later, neural network methods including Word2Vec, FastText, RNNs, and LSTMs improved contextual representation but struggled with lengthy clinical documents. Recent transformer-based models such as BioBERT, ClinicalBERT, and PubMedBERT have demonstrated superior performance by effectively capturing contextual relationships within biomedical text.
The proposed framework collects and annotates clinical records with labels such as cancer presence, subtype, and stage. It employs advanced preprocessing, contextual embeddings, ontology-based features, and fine-tuned transformer architectures. Hierarchical attention mechanisms are used to process long clinical documents, while multi-task learning enables simultaneous prediction of multiple clinical attributes. Explainability techniques, including attention visualization and feature attribution, are integrated to help clinicians understand model decisions and increase trust in AI-assisted diagnosis.
Several practical applications are identified, including automatic extraction of diagnostic features, detection of missed patient follow-ups, symptom identification, and clinical phenotyping. Experimental evaluation uses metrics such as accuracy, precision, recall, F1-score, and AUC-ROC, with particular emphasis on recall to minimize missed cancer cases. Results demonstrate that transformer-based models significantly outperform traditional machine learning approaches, especially in detecting early-stage uterine cancer indicators. Attention mechanisms successfully highlight clinically important features such as abnormal uterine bleeding, imaging findings, and biopsy results.
Despite promising outcomes, limitations remain due to the scarcity of large annotated uterine cancer datasets, inconsistencies in annotations, and variations in clinical documentation across institutions. Future research directions include integrating multimodal data (clinical text, imaging, and pathology), adopting federated learning for secure data sharing, conducting real-world clinical validation, and improving model interpretability to support broader clinical adoption.
Conclusion
This research demonstrates the potential of Natural Language Processing to transform uterine cancer detection by leveraging unstructured clinical data. Advanced transformer-based models, combined with domain-specific adaptations, offer promising improvements in early diagnosis and clinical decision support.
Addressing current limitations and ensuring seamless integration into healthcare systems will be essential for translating these advancements into practical clinical solutions.
References
[1] Bray, F. et al. Global cancer statistics 2022: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians 74, 229–263 (2024).
[2] Dhawan, S., Singh, K., & Arora, M. (2021). Cervix Image Classification for Prognosis of Cervical Cancer using Deep Neural Network with Transfer Learning. EAI Endorsed Transactions on Pervasive Health and Technology, 7(27), e5.
[3] Hiam Alquran, Wan Azani Mustafa, Isam Abu Qasmieh, Yasmeen MohdYacob, Mohammed Alsalatie, Yazan Al-Issa, Ali Mohammad Alqudah, (2022), \'Cervical Cancer Classification Using Combined Machine Learning and Deep Learning Approach\', Computers,Materials& Continua, 72(3):5117-5134.
[4] OmneyaAttallah, (2023), \'Cervical Cancer Diagnosis Based on Multi-Domain Features Using Deep Learning Enhanced by Handcrafted Descriptors\', Appl. Sci, 13(3):1-23
[5] Abinaya K, Sivakumar B, (2024), \'A Deep Learning-Based Approach for Cervical Cancer Classification Using 3D CNN and Vision Transformer\', J Imaging Inform Med, 37(1):280 296.
[6] Sher Lyn Tan, GaneshsreeSelvachandran, Weiping Ding, Raveendran Paramesran, Ketan Kotecha, (2023), \'Cervical Cancer Classification from Pap Smear Images Using Deep Convolutional Neural Network Models\', Interdisciplinary Sciences: Computational Life Sciences, 16:16–38.
[7] Jesse Jeremiah Tanimu, Mohamed Hamada, Mohammed Hassan, HabeebahKakudi, John Oladunjoye Abiodun, (2022), \'A Machine Learning Method for Classification of Cervical Cancer\', electronics, 11(3):1-23.
[8] Ashok, B., Aruna, P., 2016. Comparison of Feature selection methods for diagnosis of cervical cancer using SVM classifier. Int. J. Eng. Res. Afr. 6, 94e99.
[9] Asadi, F., Salehnasab, C., Ajori, L., 2020. Supervised algorithms of machine learning for the prediction of cervical cancer. J Biomed Phys Eng 10, 513.
[10] Diniz, D.N., Rezende, M.T., Bianchi, A.G.C., Carneiro, C.M., Luz, E.J.S., Moreira, G.J.P., et al., 2021 Jul 9. A deep learning ensemble method to assist cytopathologists in pap test image classification. J. Imaging 7 (7), 111.
[11] Nithya, B., Ilango, V., 2019. Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction. SN Appl. Sci. 1, 1e16.
[12] W. Ksi??ek, M. Hammad, P. P?awiak, U. R. Acharya, and R. Tadeusiewicz, “Development of novel ensemble model using stacking learning and evolutionary computation tech niques for automated hepatocellular carcinoma detection,” Biocybernetics and Biomedical Engineering, vol. 40, no. 4, pp. 1512–1524, 2020.
[13] B. J. Cho, Y. J. Choi, M. J. Lee et al., “Classification of cervical neoplasms on colposcopic photography using deep learning,” Scientific Reports, vol. 10, no. 1, p. 13652, 2020.
[14] H. Zhang, C. Chen, R. Gao et al., “Rapid identification of cervical adenocarcinoma and cervical squamous cell carcinoma tissue based on Raman spectroscopy combined with multiple machine learning algorithms,” Photodiagnosis and Photodynamic Therapy, vol. 33, p. 102104, 2021.
[15] Kuruvilla A, Jayanthi B. Analysis and review on feature selection and classification methods on cervical cancer. Ictact J Soft Comput 2022;12(2):2551-8..
[16] CH N, Sai PP, Madhuri G, Reddy KS, Simha B, Reddy DV. Artificial Intelligence based Cervical Cancer Risk Prediction Using M1 Algorithms. 2022 Int Conf Emerg Smart Comput Informatics 2022 Mar;1–6. doi: 10.1109/ ESCI53509.2022.9758241.
[17] Ali MM, Ahmed K, Bui FM, Paul BK, Ibrahim SM, Quinn JMW, et al. Machine learning-based statistical analy sis for early stage detection of cervical cancer. Comput Biol Med 2021 Dec;139:104985. doi: 10.1016/j.comp biomed.2021.104985..
[18] Chaudhuri AK, Ray A, Banerjee DK, Das A. A multi-stage approach combining feature selection with machine learning techniques for higher prediction reliability and accuracy in cervical cancer diagnosis. Int J Intell Syst Appl 2021 Oct 8;13(5):46–63.
[19] Peng G, Dong H, Liang T, Li L, Liu J. Diagnosis of cervical precancerous lesions based on multimodal feature changes. Comput Biol Med 2021;130:104209.
[20] Chandran V, Sumithra MG, Karthick A, George T, Deivakani M, Elakkiya B, et al. Diagnosis of Cervical Cancer based on Ensemble Deep Learning Network using Colposcopy Images. Biomed Res Int 2021;2021:5584004. doi: 10.1155/2021/5584004.
[21] Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71:209-49.
[22] Sravani, A.B.; Ghate, V.; Lewis, S. Human papillomavirus infection, cervical cancer and the less explored role of trace elements. Biol. Trace Element Res. 2022, 1–25. https://doi.org/10.1007/s12011?022?03226?2.
[23] Dang, Amit, D. Dimple and B. N. Vallish. 2023. “Extent of Use of Artificial Intelligence and Machine Learning Protocols in Cancer Diagnosis: A Scoping Review.” Indian, J. Medical Res, 157: 11?22.
[24] Razzak, M.A., M.N. Islam, M.S. Aadeeb and T. Tasnim, 2023. Digital health interventions for cervical cancer care: A systematic review and future research opportunities. PLOS ONE, Vol. 18 .10.1371/journal.pone.0296015.
[25] Andrade, P. and S. Commuri, 2023. A portable system for screening of cervical cancer.Doctoraldissertation..University of Nevada – Reno).
[26] 42. Vargas?Cardona, H.D., M. Rodriguez?Lopez, M. Arrivillaga, C. Vergara?Sanchez, J.P. García?Cifuentes, P.C. Bermúdez and A. Jaramillo?Botero, 2023. Artificial intelligence for cervical cancer screening: Scoping review, 2009–2022. Int. J. Gynecol. & Obstet., 165: 566?578