Cancer of the breast is one of the major causes of death among women all over the world with the early detection exhibited by mammography being able to significantly increase the survival rates. However, interpretation of mammographic images often goes awry because of the presence of the pectoral muscle, which tends to cover part of the breast tissue and cause the likelihood of incorrect diagnosis. In turn, strong computer-aided detection systems need accurate preprocessing and segmentation procedures to improve diagnosing efficiency. The proposed work uses a database with 322 images of mammograms to create an automated process that identifies breast cancer. The steps to do this are initially removal of the pectoral muscle to segment the area of the breast and then k-means clustering to have a specific cluster of areas that have relevant tissue. Thereafter, a set of machine-learning-based classification-based lesion detection models are tested out, such as K-Nearest Neighbours (KNN), Support Vector Machine (SVM), Logistic Regression, Decision Tree and Naive Bayes. The performance of traditional models and their improved ML enhanced models are measured to enhance the classification performance. Based on experiments, ML based classifier is shown to be better than their non counterparts with KNN having the best accuracy, sensitivity and specificity. This result indicates that when using Machine learning, the uncertainty, and the variation of mammographic data are better handled. The accuracy of breast cancer detection is enhanced within the proposed framework, but it also allows achieving early intervention and supporting clinical outcomes.
Introduction
Breast cancer is the most common cancer among women worldwide, with 2.3 million new cases and 670,000 deaths in 2022. Incidence and mortality are expected to rise significantly by 2050, particularly in low- and middle-income countries due to limited screening and treatment. Survival rates vary widely, from over 90% in high-income countries to below 40% in sub-Saharan Africa. Lifestyle factors such as alcohol consumption, obesity, and late childbearing increase risk, highlighting the need for prevention, early diagnosis, and equitable access to care.
Computer-aided detection (CAD) using mammography plays a vital role in early diagnosis. Accurate removal of the pectoral muscle in mediolateral oblique (MLO) views is critical, as muscle intensity can obscure malignancies or create false positives. Traditional techniques like thresholding or k-means clustering are effective but sensitive to noise and low contrast, while deep learning offers higher accuracy but requires large annotated datasets and computational resources. Hybrid approaches combining clustering, preprocessing, and machine learning improve segmentation and classification across datasets like MIAS, INbreast, and CBIS-DDSM.
Recent research demonstrates effective methodologies for mammogram analysis:
Segmentation: K-means clustering, Canny edge detection, region-growing, and AU-Net deep models achieve high precision in removing pectoral artifacts.
Preprocessing: Techniques like Otsu thresholding, SSRG, and Gaussian Mixture Models improve artifact removal and feature clarity.
Classification: Machine learning models (KNN, SVM, Logistic Regression, Decision Tree, Naïve Bayes) combined with hybrid preprocessing achieve high accuracy in detecting breast abnormalities. Metaheuristic optimization (e.g., PSO, GA) further enhances performance.
The study proposes a workflow combining preprocessing, pectoral muscle removal using k-means segmentation, and classification with machine learning models. Preprocessed images are resized, normalized, and flattened for supervised learning. Classifiers are trained on pseudo-labels derived from k-means segmentation and evaluated using statistical measures of true positives and false positives. Hybrid approaches that integrate precise segmentation and classifier enhancement consistently outperform individual methods, improving the reliability and accuracy of CAD systems in breast cancer detection.
Conclusion
Summarily the incorporation of conventional machine learning models tends to result in enhanced performance especially in dealing with uncertainty and vague data. KNN and Decision Tree proved to exhibit significant improvements in accuracy, sensitivity and specificity over their traditional counterparts showing enhanced classification reliability. Naive Bayes also proved to improve especially with regards to precision and overlap measures such as the dice coefficient. While Logistic Regression also did not do as well indicating that ML methods may not necessarily suit all models and must be tuned with care. SVM was the most potentially sensitive was willing to lose some precision which is an acceptable trade-off to make for high-stakes or mission critical tasks.
In general, while ML improves a lot of machine learning models it is application and algorithm-dependent highlighting the necessity for custom approaches and more research.
References
[1] World Health Organization (2022). Breast cancer: Key facts. https://www.who.int/news-room/fact-sheets/detail/breast-cancer
[2] United Nations Geneva (2025). Breast cancer cases projected to rise nearly 40% by 2050, WHO warns. https://www.ungeneva.org/en/news-media/news/2025/02/103638/breast-cancer-cases-projected-rise-nearly-40-cent-2050-who-warns
[3] International Agency for Research on Cancer (IARC). Breast cancer cases and deaths are projected to rise globally. https://www.iarc.who.int/news-events/breast-cancer-cases-and-deaths-are-projected-to-rise-globally
[4] WHO Global Breast Cancer Initiative. Tackling inequities in survival. https://www.who.int/initiatives/global-breast-cancer-initiative/breast-cancer-inequities
[5] The Guardian (2025). Women should avoid all alcohol to reduce breast cancer risk, experts say. https://www.theguardian.com/society/2025/apr/09/women-should-avoid-all-alcohol-reduce-breast-cancer-risk
[6] Khoulqi I, Idrissi N, Sarfraz M (2021). Segmentation of Pectoral Muscle in Mammogram Images using K means and Region Growing. Information Sciences Letters, MIAS.
[7] Ayala Godoy JA, Lillo RE, Romo J (2020). Automatic elimination of the pectoral muscle in mammograms based on anatomical features. arXiv; https://arxiv.org/abs/2009.06357 arXiv
[8] Chen S, Bennett DL, Colditz GA, Jiang S (2024). Pectoral muscle removal in mammogram images: A novel approach for improved accuracy and efficiency. Cancer Causes Control, 35(1):185–191. DOI: 10.1007/s10552-023-01781-0
[9] Aliniya P, Nicolescu M, Bebis G (2024). Supervised Pectoral Muscle Removal in Mammography Images. In: Artificial Intelligence in Medicine (AIME).
[10] Sulaimani SH, Sagheer AM, Veisi H (2021). Improving Breast Cancer Classification using SMOTE & Pectoral Muscle Removal in Mammographic Images. MENDEL Journal. DOI: https://doi.org/10.13164/mendel.2021.2.036
[11] International Journal of Bioinformatics Research and Applications (2024). Automatic pectoral muscles and artefacts removal… MIAS; DOI:10.1504/IJBRA.2024.142550
[12] Khoulqi I, Idrissi N, Sarfraz M (2023). Segmentation of Pectoral Muscle … Using GMM-EM. In: Research Anthology… DOI:10.4018/978-1-6684-7136-4.ch038
[13] Varela L, Vera P, Calvo B, et al. (2022). Automated pectoral muscle segmentation in mammograms: A comparative study. Comput Methods Programs Biomed. https://doi.org/10.1016/j.cmpb.2022.106955
[14] Ma J, Sun Y, Li J, et al. (2021). Deep learning-based segmentation for pectoral muscle removal in mammograms. IEEE Trans Med Imaging. https://doi.org/10.1109/TMI.2021.3059842
[15] Park S, Lee JH, Moon WK. (2020). Variability analysis of mammographic images and its impact on breast cancer diagnosis. Phys Med Biol. https://doi.org/10.1088/1361-6560/ab8b13
[16] A. Rampun et al., “Breast pectoral muscle segmentation in mammograms using a modified holistically-nested edge detection network,” Medical Image Analysis, vol. 57, pp. 1–17, Jun. 2019, doi: 10.1016/j.media.2019.06.007.
[17] S. Dhahbi, W. Barhoumi, and E. Zagrouba, “Breast cancer diagnosis in digitized mammograms using curvelet moments,” Computers in Biology and Medicine, vol. 64, pp. 79–90, Jun. 2015, doi: 10.1016/j.compbiomed.2015.06.012.
[18] W. M. Shaban, “Insight into breast cancer detection: new hybrid feature selection method,” Neural Computing and Applications, vol. 35, no. 9, pp. 6831–6853, Dec. 2022, doi: 10.1007/s00521-022-08062-y.
[19] M. A. Hernández-Vázquez, Y. M. Hernández-Rodríguez, F. D. Cortes-Rojas, R. Bayareh-Mancilla, and O. E. Cigarroa-Mayorga, “Hybrid Feature Mammogram Analysis: Detecting and Localizing Microcalcifications Combining Gabor, Prewitt, GLCM Features, and Top Hat Filtering Enhanced with CNN Architecture,” Diagnostics, vol. 14, no. 15, p. 1691, Aug. 2024, doi: 10.3390/diagnostics14151691.
[20] Jeevitha, V., & Aroquiaraj, I. L. (2024). Mammogram images using noise removal of filtering techniques. International Research Journal on Advanced Engineering Hub (IRJAEH), 2(02),118–123. https://doi.org/10.47392/irjaeh.2024.0022
[21] Jeevitha, V., & Aroquiaraj, I., (2025) Mammogram Image Classification for Filtering Methods With 3-Dimensional Fitness Model, Mukt Shabd Journal, 14(1), 1226-1240.
[22] Mustra, M. et al., Pectoral muscle segmentation from mammograms: A review, Digital Signal Processing, 2016.
[23] Ayala-Godoy, F. et al., Mammographic image preprocessing using histogram techniques and segmentation, Springer, 2020.
[24] Jeevitha, V., & Aroquiaraj, I., (2025). K-Means of Canny Edge Detection Methods Utilizing Mammogram Image Enhancement. Indian Journal of Technical Education. 48(2),16-23.
[25] Rampun, A. et al., Pectoral muscle suppression using modified HED networks, Diagnostics, 2022.
[26] Aroquiaraj, I. L., Thangavel, K., Pectoral muscle removal using fuzzy logic and line fitting, arXiv:1401.0870, 2014.
[27] Chen, Y. et al., Fast and accurate pectoral muscle segmentation via binarization and interpolation, PubMed, 2023.
[28] Jeevitha, V., & Aroquiaraj, I., (2025). Haralick Feature-Based Texture Analysis from GLCM and SRDM for Breast Cancer Detection in Mammogram Images. International Conference on Sensors and Related Networks (SENNET) Special Focus on Digital Healthcare (64220), Vellore, India, pp. 1-6, doi: 10.1109/SENNET64220.2025.11135944.
[29] Oza, P. et al., Comparative analysis of classifiers for mammogram classification, Biomed Pharmacol J, 2020.
[30] Jeevitha, V., & Aroquiaraj, I., (2027). Enhancing Breast Cancer Detection: Novel Mammogram Image Analysis using Median Filter Innovation. International Journal of Intelligent Engineering Informatics, 15(1). https://doi.org/10.1504/ijiei.2027.10070073
[31] Khan, S. et al., Classification of mammograms using SVM and CNN techniques, Computers in Biology and Medicine, 2021.
[32] Jeevitha, V., & Aroquiaraj, I., (2024). Novel Medical Image Analysis for Breast Cancer Identification, International Journal of Education, Modern Management, Applied Science & Social Science, 06(03(02)), 01-06.
[33] Sulaimani, R. A. et al., K-means with SMOTE and Random Forest for breast cancer detection, Mendel Journal, 2021.
[34] Jeevitha, V., & Aroquiaraj, I., (2025). Medical Image Analysis Utilizing Gaussian Smoothing Filter to Detect Breast Cancer in Mammogram Image Enhancement. International Conference on Sensors and Related Networks (SENNET) Special Focus on Digital Healthcare (64220), Vellore, India, pp. 1-4, doi: 10.1109/SENNET64220.2025.11135970.
[35] Salod, Z., Singh, Y., A review on breast cancer detection using machine learning, Journal of Public Health Research, 2019.