Cardiovascular diseases are a global health concern. This study uses a combined machine learning approach to predict heart diseases - it joins Random Forest in addition to Linear Regression. The heart\'s workings are complex, and this research works to improve prediction accuracy. It does not set up the model structure beforehand. The framework trains on old patient data - this data includes age, cholesterol, blood pressure along with how people live. The Random Forest handles complicated connections, and Linear Regression helps explain them. The study looks how many points must be included - it shows what influences heart disease outcomes. This tells effective knowledge for individual healthcare programs. This hybrid methodology, without reliance on a predetermined model, keeps belief for enhancing quick prediction strategies and contributing to more effective interventions in cardiovascular health.
Introduction
The project aims to develop a smart and reliable heart disease prediction system using a hybrid machine learning model that combines Linear Regression (LR) and Random Forest (RF) within a ModelTree ensemble. The goal is to achieve high prediction accuracy while ensuring model interpretability and clinical relevance.
2. Key Features
Model: A hybrid of Linear Regression + Random Forest, offering a balance of interpretability and robustness.
Framework: Built in Python using scikit-learn, with modules for data preprocessing, model training, evaluation, and prediction.
Data Handling: Involves missing value treatment (using SimpleImputer), feature scaling (StandardScaler), and feature selection for consistent input data.
Prediction Interface: Real-time client interface for user inputs like age, cholesterol, and BP, showing results as:
? “The person is healthy”
? “Has heart disease”
3. Methodology Overview
Data Split: 75% for training, 25% for testing.
Training: Uses LR to model linear relationships and RF for non-linear patterns.
Evaluation Metrics: Accuracy, Precision, Recall, F1-score, and AUC-ROC.
Model Saving: Trained models are serialized for future use using joblib or pickle.
4. Comparative Evaluation
Model
AUC
Distance to Ideal Point
Linear Regression (LR)
0.81
0.275
Random Forest (RF)
0.84
0.25
Hybrid (RF + LR)
0.78
0.34
Hybrid (LR + RF)
0.98
0.02
Best Performer: The Hybrid (LR + RF) model, with:
AUC of 0.98 (highest classification accuracy)
Minimum distance to the ideal point on the ROC curve (0.02)
Worst Performer: The Hybrid (RF + LR) with the lowest AUC (0.78) and highest distance (0.34).
5. Literature Review Highlights
Various ML models have been used for heart disease prediction:
Logistic Regression Bagging achieved ROC of 0.91.
Random Forest, Support Vector Machines, and Hybrid SVM-QPSO have shown high efficiency.
Recent studies also emphasize data preprocessing, feature selection, and cross-validation as crucial for improved performance.
Conclusion
Heart diseases is a major challenge, continuous factors and the presence of many risks whose interrelationship is nonlinear. The Random Forest suits well with this issue; the only problem being that it might end up with overfitting, whereas the linear regression might cause failure since it is too simple. So as provided in the text can be the amalgamation of various algorithms. Basic motive to its methodology is to exploit merging of algorithms to improve the rate of accuracy to prediction, give new information about the intricate connections in the heart diseases, and reduce the shortcomings of the individual models. To sum up, this joint method is really prospective for progress in medical technologies and thus improving the diagnosis and therapy of heart diseases will be a step further in fighting the multidimensional problems arising from cardiovascular illnesses.
References
[1] K. V. V. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, H. N. Chua, and S. Pranavanand, “Heart Disease Risk Prediction Using Machine Learning Classifiers with Attribute Evaluators,” Applied Sciences, vol. 11, art. 8352, 2021.
[2] M. S. Raja, M. Anurag, C. P. Reddy, and N. Sirisala, “Machine Learning Based Heart Disease Prediction System,” in Proc. Int. Conf. on Computer Communication and Informatics (ICCCI), pp. 12–28, 2021.
[3] P. Ghosh, S. Azam, M. Jonkman, A. Karim, F. M. J. Mehedi Shamrat, E. Ignatious, S. Shultana, A. R. Beeravolu, and F. De Boer, “Efficient Prediction of Cardiovascular Disease Using Machine Learning Algorithms with Relief and LASSO Feature Selection Techniques,” IEEE Access, vol. 9, pp. 19304–19326, 2021.
[4] E. I. Elsedimy, S. M. M. AboHashish, and F. Algarni, “New cardiovascular disease prediction approach using support vector machine and quantum-behaved particle swarm optimization,” Multimedia Tools and Applications, vol. 83, no. 8, pp. 23901–23928, 2023.
[5] N. Chandrasekhar and S. Peddakrishna, “Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization,” Processes, vol. 11, art. 1210, 2023.
[6] J. O. R. Kim, Y.-S. Jeong, J. H. Kim, J. W. Lee, D. Park, and H.-S. Kim, “Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database,” Diagnostics, vol. 11, art. 943, 2021.
[7] O. Taylan, A. S. Alkabaa, H. S. Alqabbaa, E. Pamukçu, and V. Leiva, “Early Prediction in Classification of Cardiovascular Diseases with Machine Learning, Neuro-Fuzzy and Statistical Methods,” Biology, vol. 12, art. 117, 2023
[8] V. Sharma, S. Yadav, and M. Gupta, “Heart Disease Prediction using Machine Learning Techniques,” in Proc. 2020 2nd Int. Conf. on Advances in Computing, Communication, Control and Networking (ICACCCN), Greater Noida, India, pp. 177–181, Dec. 2020
[9] C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, vol. 16, no. 2, art. 88, 2023
[10] D. E. Salhi, A. Tari, and M.-T. Kechadi, “Using Machine Learning for Heart Disease Prediction,” in Advances in Computing Systems and Applications (Proc. 4th Conference on Computing Systems and Applications), pp. 70–81, Feb. 2021.
[11] C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, Special Issue: Artificial Intelligence Algorithms for Healthcare, vol. 16, no. 2, art. 88, 2023
[12] D. E. Salhi, A. Tari, and M.-T. Kechadi, “Using Machine Learning for Heart Disease Prediction,” Procedia Computer Science, vol. 199, pp. 628–635, 2021.