Today, Polycystic Ovary Syndrome (PCOS) is found in numerous women, so it is a common issue. PCOS is a hormonal disorder that leads to delayed, irregular, or absent menstrual cycles in the female body. This syndrome may cause the growth of type 2 diabetes, gestational diabetes, weight gain, excess body hair, and many other complications. In advanced cases, PCOS may lead to infertility, which is a challenge for patients who are attempting to conceive. According to statistics, the rate of incidence of PCOS in recent years has greatly increased, which is alarming. If PCOS is diagnosed early enough, individuals can adhere to their physician\'s advice and live a healthier life. The data collection in this study includes records of 466 patients. The purpose of this study is to utilize machine learning models to determine patterns in this disorder. The learned information is then fed into different algorithms to determine accuracy, specificity, sensitivity, and precision using different ML models, including Logistic Regression (LR), Decision Tree (DT), XGBoost, Random Forest (RF), and Support Vector Machine (SVM) among others. The study used the Mutual Information model for feature selection and compared models to identify the most precise one. Using the Mutual Information model for feature engineering, AB and RF attained the highest accuracy of 94 %.
Keywords: Machine learning ,SVM,GradientBoosting,Polycystic ovary syndrome.
Introduction
PCOS is the most common endocrine disorder among women of reproductive age (AFAB – Assigned Female at Birth).
It involves hormonal imbalances, especially elevated androgens (like testosterone), and can lead to:
Irregular menstruation
Infertility
Ovarian cysts
Acne, weight gain, and hirsutism
Causes remain unclear but may include genetics, inflammation, and insulin resistance.
???? Health Risks & Prevalence:
Can increase the risk of diabetes, hypertension, sleep apnea, depression, heart disease, and endometrial cancer.
Affects 5–10% of women globally; over 70% remain undiagnosed.
Ethnic variations in prevalence:
Asian women: 31.3%
Indian women: 15.3%
African American women: 8%
White women: 4.8%
???? Diagnosis & Treatment
Medical Treatments:
Hormonal birth control to regulate cycles.
Anti-androgens for acne and hair growth.
Metformin for insulin resistance.
Fertility drugs for ovulation.
Laparoscopic ovarian drilling in resistant cases.
Lifestyle Changes:
Diet and exercise play a major role.
Early diagnosis is critical for symptom control and preventing complications.
???? AI & Machine Learning in PCOS Detection
Importance:
PCOS is a binary classification problem (Yes/No).
AI tools and ML models now assist in early detection and prediction.
ML Models Used:
Logistic Regression (LR)
GaussianNB
Support Vector Machine (SVM)
K-Nearest Neighbors (KNN)
Bernoulli & Multinomial Naive Bayes
XGBoost
Random Forest (RF)
Performance Examples:
Study
Dataset
Method
Accuracy
Silva et al.
145 patients
XGBoost
86%
Khanna et al.
541 patients (Kerala)
Multi-stacking ML
98%
Bharati et al.
541 patients
RFLR Hybrid
91.01%
Zigarelli et al.
466 patients
CatBoost
82.5–90.1%
Abu Adla et al.
466 patients
SVM
91.6%
Hassan et al.
—
Random Forest
96%
???? Proposed System in the Study
Data Source: Kaggle dataset with 541 records, reduced to 466 after cleaning.
Preprocessing Steps:
Fill nulls, remove duplicates, convert data types, drop irrelevant columns.
Used Mutual Information for feature selection (top 12 features).
Features with high importance: Hair growth, weight gain, fast food habit, FSH levels.
Data Processing Pipeline:
Data Merging & Cleaning
Encoding & Scaling
Splitting: 80% training / 20% testing
Model Testing & Evaluation
Performance Metrics: Accuracy, F1 Score, Precision, ROC score
Confusion Matrix for classification validation
???? Visualization Insights
Heatmaps and bar charts were used to analyze feature correlations.
Symptoms like fast food consumption, weight gain, hair growth, and skin darkening are strongly linked to PCOS.
Feature Correlation Colors:
Deep red = strong positive correlation
Deep blue = strong negative correlation
???? Key Findings
Random Forest and XGBoost performed best in model accuracy and predictive capability.
Mutual Information proved highly effective in identifying critical predictive features.
The system provides a user-friendly interface that helps in early detection of PCOS, improving the possibility of timely treatment.
Conclusion
PCOS patients can endure infertility, unable to give birth to children if the disease is left undiagnosed in the early stages. With limitations in early diagnosis, the incidence rate of a PCOS patient increase is greater than past years\' records. In response to this, suggested research has come up with a machine learning-based web interface prediction system. Early diagnosis may empower patients to make necessaryfollow prescribed salary steps recommended by their physician to maintain a healthier life. The aim of our research is to employ the use of machine learning models in understanding patterns of this condition. The models are trained using data to exhibit accuracy, specificity, sensitivity, precision, and overall performance employing different ML algorithms like Random Forest, Logistic Regresion, Decision Tree Classifier, AdaBoost Classifier, XGBoostClassifier,Support Vector Machines, among others.
References
[1] O? guz SH, et al. The prevalence, phenotype and cardiometabolic risk of polycystic ovary syndrome in treatment-naïve transgender people assigned female at birth. Endocrine 2023:1–6.
[2] Kulkarni S, et al. Polycystic ovarian syndrome: Current scenario and future insights. Drug Discov Today 2023:103821.
[3] Zhao H, et al. Insulin resistance in polycystic ovary syndrome across various tissues: an updated review of pathogenesis, evaluation, and treatment. J Ovarian Res 2023;16(1):9.
[4] Ma L, et al. The life cycle of the ovary. In: Ovarian aging. Springer; 2023. p. 7–33.
[5] Hajam YA, et al. Herbal medicine applications for polycystic ovarian syndrome. CRC Press; 2023.
[6] Wu T, et al. The cellular and molecular mechanisms of ovarian aging. In: Ovarian aging. Springer; 2023. p. 119–69.
[7] Adashi EY, et al. The polycystic ovary syndrome: the first 150 years of study. F&S Reports 2023;4(1):2–18.
[8] Miles KS. Our pearls matter: PCOS through the lens of women of color and white women. 2023.
[9] Bhat SA. Detection of polycystic ovary syndrome using machine learning algorithms. Dublin: National College of Ireland; 2021.
[10] Benjamin JJ, et al. Stress and polycystic ovarian syndrome-a case control study among Indian women. Clinical Epidemiology and Global Health 2023;22:101326.
[11] Karkera S, Agard E, Sankova L. The clinical manifestations of polycystic ovary syndrome (PCOS) and the treatment options. European Journal of Biology and Medical Science Research 2023;11(1):57–91.
[12] Shankar DY, et al. Overview of polycystic ovary syndrome (PCOS). World Journal of Advanced Engineering Technology and Sciences 2023;8(2):11–22.
[13] Dadoush SFM. Diagnosing and treating the causes of women’s polycystic ovary syndrome: clinical and prospective study. African Journal of Advanced Pure and Applied Sciences (AJAPAS) 2023:401–7.