sleep disorder prediction involves identifying early signs that a person may develop problems with sleep, such as insomnia, sleep apnea, or restless leg syndrome. It relies on monitoring patterns in sleep duration, quality, and consistency. Factors like stress levels, irregular work schedules, lifestyle habits, and medical history can indicate higher risk. Physical symptoms such as excessive daytime sleepiness, difficulty falling asleep, or frequent awakenings are important clues. Regular tracking of these signals can help anticipate disorders before they become severe. Early prediction allows for timely lifestyle adjustments, medical consultations, and preventive measures to maintain healthy sleep. The results show that Random Forest, Gradient Boosting, and XGBoost all reached a peak accuracy of 0.947, with Random Forest chosen as the final model due to its stability, ability to prevent overfitting, and suitability for structured medical data.
Introduction
Sleep is essential for maintaining physical health, mental stability, and overall well-being, but many sleep disorders remain undiagnosed due to limited awareness, high medical costs, and restricted access to clinical tests. Traditional diagnostic methods such as Polysomnography are expensive and time-consuming, making them unsuitable for large-scale screening. To address this issue, the study proposes a machine learning–based system that predicts sleep disorders using health and lifestyle data such as sleep duration, stress level, physical activity, body mass index (BMI), blood pressure, occupation, and sleep quality. The goal is to provide an affordable and intelligent tool for early detection and preventive healthcare.
The system uses a dataset from Kaggle containing simulated electronic health records, including digitized medical reports and clinical indicators related to sleep disorders. Several machine learning algorithms are implemented and compared, including XGBoost, Gradient Boosting, and Random Forest, along with other models such as Decision Tree, K-Nearest Neighbours, Support Vector Machine, Multilayer Perceptron, and Logistic Regression.
The system is developed as a web application using Django. Users register or log in, upload sleep-related datasets, and the system performs data validation, preprocessing, and feature selection. The dataset is split into training (80%) and testing (20%) sets, and machine learning models are trained and evaluated using metrics such as precision, recall, F1-score, and accuracy. The trained model then predicts sleep quality as Good Sleep, Moderate Sleep, or Poor Sleep, and results are displayed with accuracy scores and graphs. Predictions are stored in an SQLite database for future reference.
Conclusion
The experimental results show that the highest accuracy achieved is 0.947 and was obtained by Random Forest, Gradient Boosting, XGBoost. among these models Random Forest is selected as the final best model due to its high accuracy, robustness, resistance to overfitting, and suitability for structured medical datasets. This suggests that for structured health datasets tree-based ensemble learning models are highly effective.
References
[1] T. S. Alshammari, \"Applying Machine Learning Algorithms for the Classification of Sleep Disorders,\" in IEEE Access, vol. 12, pp. 36110 36121, 2024, doi: 10.1109/ACCESS.2024.3374408.
[2] Locharla Ravikumar et al., (2025). SLEEP DISORDER PREDICTION USING MACHINE LEARNING. International Journal of Engineering Technology Research & Management (IJETRM), 09(07).
[3] Dr. V. Shanmugapriya et al., “Sleep disorder prediction using machine learning” International Journal of Scientific and Advanced Research in Technology, vol. 11, no. 3, 2025.