Early Disease Prediction using Machine Learning and Deep Learning Algorithms

Authors: Vyshnavi Posu, G. Narasimham

DOI Link: https://doi.org/10.22214/ijraset.2025.72979

Abstract

Early detection of life-threatening diseases such as cancer and thyroid disorders plays a critical role in reducing mortality and improving treatment outcomes. With traditional diagnostic methods often being time-consuming, resource-intensive, and dependent on specialist expertise, the integration of intelligent systems has become essential. This study explores the application of various machine learning (ML) and deep learning (DL) algorithms to predict diseases at an early stage using structured clinical datasets. Algorithms like Random Forest, Decision Tree, K-Nearest Neighbors (KNN), Naïve Bayes, and Artificial Neural Networks (ANN) were implemented and evaluated based on accuracy, precision, recall, and F1-score. The results show that all models achieved above 85% accuracy, with ANN and Random Forest models performing exceptionally well. A Streamlit-based web application was developed for real-time prediction, enabling easy clinical integration. The research underscores the effectiveness of ML/DL in enhancing diagnostic efficiency and recommends further development involving real clinical data and advanced model interpretability.

Introduction

1. Background

Early detection of diseases like cancer and thyroid disorders is vital but challenging due to subtle or absent symptoms. Traditional diagnostics, though accurate, are resource-intensive and slow. Machine Learning (ML) offers a faster, scalable alternative by learning from historical medical data to identify patterns and enable quicker, more reliable diagnoses.

2. Literature Survey

Recent studies highlight the growing shift toward ML and Deep Learning (DL) in healthcare:

Ensemble models and ANNs show superior accuracy.
Naïve Bayes and Random Forest perform well for thyroid classification.
Traditional models like SVM struggle with high-dimensional, imbalanced data.

3. Objectives

The main goal is to develop a multi-disease prediction system that:

Leverages ML/DL for accurate predictions.
Offers real-time diagnosis via a Streamlit web interface.
Is robust to real-world data issues like noise and imbalance.

4. System Overview

A. Existing System

Traditional models like Logistic Regression and SVM lack the capacity to handle complex, noisy, and high-dimensional clinical datasets.

B. Proposed System

Utilizes advanced ML/DL models including:

ML Models: Decision Tree, Random Forest, KNN, Naïve Bayes, XGBoost
DL Model: Artificial Neural Network (ANN)

These models offer improved accuracy, scalability, and noise resistance.

5. Methodology

Data Acquisition: Collect disease-specific datasets (cancer, thyroid, Alzheimer’s, etc.).
Preprocessing: Clean data, select key features, normalize values.
Model Development: Train ML/DL models per disease.
Evaluation Metrics: Accuracy, precision, recall, F1-score with cross-validation.
Deployment: Use Streamlit for real-time web predictions.
Integration: Models saved and deployed via web/cloud platforms.

6. Results & Analysis

Disease	Best Model	Accuracy	Precision	Recall	F1-Score
Thyroid	Random Forest	97.2%	0.96	0.97	0.965
Breast Cancer	ANN (MLP)	98.4%	0.98	0.98	0.98
Lung Cancer	XGBoost	96.5%	0.95	0.96	0.955
Skin Cancer	KNN	91.7%	0.90	0.92	0.91

Comparative Insights:

ANN delivers highest accuracy but is less interpretable and requires longer training time.
Random Forest balances accuracy and efficiency, making it ideal for clinical applications like thyroid diagnosis.
KNN is useful for simpler cases like skin cancer.
Naïve Bayes suits quick screening with minimal computation.

Conclusion

This project demonstrates the effectiveness of Machine Learning (ML) and Deep Learning (DL) models in the early detection of critical diseases such as thyroid disorders, breast cancer, lung cancer, and skin cancer. Utilizing structured clinical datasets, the system achieved high prediction accuracy—exceeding 85% across all models, including Random Forest, ANN, KNN, and XGBoost. A user-friendly Streamlit interface was developed to enable real-time predictions and patient interaction. The models were rigorously evaluated using metrics such as accuracy, precision, recall, F1-score, and confusion matrix. The results validate the potential of both traditional ML and advanced DL techniques in improving healthcare diagnostics, reducing clinical workload, and enabling early interventions crucial for patient recovery and survival.

References

[1] A. Sharma and P. Tyagi, “Hybrid Ensemble Method for Breast Cancer Prediction,” Int. J. Adv. Res. Comput. Sci., vol. 11, no. 5, pp. 30–35, May 2020. [2] M. Patel and R. Singh, “Thyroid Disorder Classification Using Machine Learning Algorithms,” IEEE Access, vol. 8, pp. 132667–132674, 2020. [3] H. Zhou and co-authors, “Survey on the Use of Artificial Neural Networks in Healthcare Diagnosis,” Artif. Intell. Med., vol. 113, pp. 1–9, 2021. [4] S. Kumar and T. Gupta, “Analyzing Naïve Bayes and KNN Techniques for Predicting Diseases,” Procedia Comput. Sci., vol. 167, pp. 1256–1264, 2020. [5] J. Wang et al., “Examining SVM Limitations in Complex Medical Datasets and Exploring Better Alternatives,” J. Biomed. Inform., vol. 112, article ID 103620, 2020. [6] F. Chollet, Deep Learning Using Python, 2nd ed., Shelter Island, NY: Manning Publications, 2021. [7] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning Fundamentals, Cambridge, MA: MIT Press, 2016.

Copyright

Copyright © 2025 Vyshnavi Posu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET72979

Publish Date : 2025-07-03

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here