This paper presents a comprehensive machine learning based system for disease prediction using symptom-based input. The system integrates three classification algorithms—Decision Tree (DT), Random Forest (RF), and Naive Bayes (NB)—to analyze and predict 41 distinct diseases from 132 binary-encoded symptoms. A curated dataset comprising symptom-disease mappings was preprocessed and used to train and evaluate the models. The experimental results demonstrate that both Decision Tree and Random Forest achieve an accuracy of 95.12%, while Naive Bayes shows competitive performance with minor trade-offs. The system features a modern graphical user interface (GUI) developed using CustomTkinter, providing intuitive symptom selection and real-time prediction capabilities. This research highlights the potential of machine learning in healthcare for preliminary diagnosis, offering a scalable, interpretable, and accessible tool for medical decision support. The limitations, including dataset coverage and symptom representation constraints, are discussed along with future directions for enhancement.
Introduction
The text presents a machine learning–based healthcare system designed for symptom-driven disease prediction. It highlights the importance of AI in improving early diagnosis, especially in resource-limited settings where clinical expertise may be scarce. The proposed system uses historical medical data to provide fast and accurate preliminary diagnoses, supporting healthcare professionals and patients.
The research implements and compares three classical machine learning models—Decision Tree (DT), Random Forest (RF), and Naive Bayes (NB)—for multi-class disease prediction across 41 diseases using a dataset with 132 binary symptom features. A user-friendly graphical interface built with CustomTkinter enables intuitive symptom input, multi-model prediction, result visualization, and PDF report generation.
The models were evaluated using accuracy, precision, recall, F1-score, and confusion matrices. Decision Tree and Random Forest achieved the highest accuracy (95.12%), while Naive Bayes achieved slightly lower accuracy (92.68%) but offered better computational efficiency. Random Forest provided the best balance between accuracy and generalization, Decision Trees offered high interpretability, and Naive Bayes proved suitable for resource-constrained environments.
Overall, the system demonstrates strong potential as an accessible, automated preliminary diagnostic tool, combining robust machine learning performance with an intuitive interface, while also discussing limitations and future enhancements for real-world clinical use.
Conclusion
This study demonstrates the feasibility of machine learning for symptom-based disease prediction, achieving high accuracy with an intuitive GUI. Comparative analysis supports algorithm selection for clinical needs, while transparent results make it a practical tool for preliminary healthcare assessment. Future directions include expanding datasets, integrating advanced models, enriching symptom representation, and enabling real-time links with EHRs, wearables, and hospital databases. Cloud deployment, mobile apps, multilingual support, and Explainable AI will enhance accessibility and transparency, with clinical validation and regulatory compliance critical for real-world adoption.
References
[1] Cleophas, Ton J., and Aeilko H. Zwinderman. Machine learning in medicine-a complete overview. Cham, Switzerland: Springer International Publishing, 2020.
[2] Jain, R., Chotani, A., & Anuradha, G. (2021). Disease diagnosis using machine learning: A comparative study. In Data analytics in biomedical engineering and healthcare (pp. 145-161). Academic Press.
[3] Lundberg, Scott, and Su-In Lee. \"A unified approach to interpreting model predictions. 2017.\" arXiv preprint arXiv:1705.07874 (2022).
[4] Yahaya, Lamido, N. David Oye, and E. Joshua Garba. \"A comprehensive review on heart disease prediction using data mining and machine learning techniques.\" American Journal of Artificial Intelligence 4.1 (2020).
[5] Kaushik, Priyanka, et al. \"AI-powered dermatology: Achieving dermatologist-grade skin cancer classification.\" 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI). Vol. 2. IEEE, 2024.
[6] Raschka, Sebastian, Yuxi Hayden Liu, and Vahid Mirjalili. Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python. Packt Publishing Ltd, 2022.
[7] Chaki, Jyotismita. \"Deep learning in healthcare: applications, challenges, and opportunities.\" Next Generation Healthcare Informatics (2022).
[8] CustomTKinter Documentation, “Modern User Interface for Python TKinter,” 2023.
[9] Patel, R., Mehta, V., & Joshi, S. (2023). “Federated Learning for Privacy-Preserving Disease Prediction in Distributed Healthcare Systems.” IEEE Transactions on Medical Imaging, 42(5), 1234–1245.
[10] Li, H., Wu, J., & Zhang, X. (2024). “A Hybrid CNN-LSTM Framework for Symptom-Based Early Disease Prediction with Real-Time Data Integration.” Artificial Intelligence in Medicine, 151, 102876.