Advanced Heart Attack Risk Prediction Using Stacked Hybrid Machine Learning

Authors: M. Vijaya Kumar, D. V. V. Manikanta Raju, A. D. M. Praveen, P. Baladithya, V. Sai Dinesh Kumar

DOI Link: https://doi.org/10.22214/ijraset.2026.83687

Abstract

Heart disease remains one of the leading causes of mortality worldwide, making early prediction essential for effective prevention and timely treatment. This paper presents an advanced machine learning-based system for predicting heart attack risk using a stacked hybrid ensemble approach. The proposed system integrates multiple machine learning algorithms, including Random Forest, Decision Tree, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression, Gradient Boosting, and XGBoost. These models are combined using a stacking classifier with Logistic Regression as the meta-learner to achieve higher accuracy and reliability. The system analyzes various patient health parameters such as age, cholesterol levels, blood pressure, heart rate, and other clinical factors from the Cleveland Heart Disease Dataset. Data preprocessing techniques including data cleaning, feature scaling using StandardScaler, feature selection, and stratified train-test splitting are applied to improve robustness. Experimental results demonstrate that the stacked hybrid model achieves a prediction accuracy of approximately 89–90%, outperforming individual base classifiers. The proposed solution offers a non-invasive, cost-effective, and accurate method for early detection of heart disease, thereby supporting healthcare professionals in making informed clinical decisions.

Introduction

This paper proposes an Advanced Heart Attack Risk Prediction System that uses a stacked hybrid machine learning approach to improve the early detection of heart disease, one of the leading causes of death worldwide. Early identification of heart attack risk can significantly reduce mortality and improve patient outcomes. Traditional diagnostic methods rely heavily on manual evaluation by healthcare professionals, which can be time-consuming and susceptible to human error. The integration of Artificial Intelligence (AI) and Machine Learning (ML) offers a faster, more accurate, and data-driven alternative for supporting medical decision-making.

Objectives and Motivation

The primary goal of the study is to develop an intelligent system capable of predicting heart attack risk using patient medical data. The system analyzes important health indicators such as:

Age
Blood pressure
Cholesterol levels
Heart rate
Chest pain type
Blood sugar levels
Electrocardiogram (ECG) results
Other cardiovascular risk factors

To improve prediction accuracy, the study combines multiple machine learning algorithms through a stacking ensemble technique.

Literature Review

Previous research has demonstrated the effectiveness of machine learning algorithms in heart disease prediction. Commonly used models include:

Logistic Regression – Simple and interpretable but less effective for complex data.
Decision Tree – Easy to understand but prone to overfitting.
Random Forest – High accuracy and reduced overfitting through ensemble learning.
K-Nearest Neighbors (KNN) – Effective for small datasets but computationally expensive for larger datasets.
Support Vector Machine (SVM) – Strong performance on high-dimensional data but requires careful tuning.
XGBoost and Gradient Boosting – Advanced boosting algorithms that capture complex patterns and improve predictive performance.

The literature also identifies challenges such as:

Missing and inconsistent medical data.
Imbalanced datasets.
Feature selection difficulties.
Overfitting issues.

Research shows that hybrid and ensemble models consistently outperform single classifiers by combining the strengths of multiple algorithms.

Proposed Methodology

The proposed system employs a stacked hybrid machine learning model that integrates several classifiers:

Random Forest
Decision Tree
K-Nearest Neighbors (KNN)
Support Vector Machine (SVM)
Logistic Regression
XGBoost
Gradient Boosting

The outputs of the base models are combined using a Logistic Regression meta-learner, which produces the final prediction.

System Workflow

The prediction process consists of five stages:

Data Collection and Preprocessing
- Uses the Cleveland Heart Disease Dataset.
- Handles missing values and inconsistencies.
- Applies normalization using StandardScaler.
Feature Selection
- Identifies the most relevant attributes such as age, cholesterol, blood pressure, and heart rate.
- Removes redundant features to improve efficiency.
Model Training
- Uses an 80:20 stratified train-test split.
- Applies cross-validation and hyperparameter tuning.
Stacking Ensemble
- Combines predictions from Random Forest, XGBoost, SVM, and Gradient Boosting.
- Uses Logistic Regression as the final meta-model.
Risk Prediction
- Classifies patients as either high-risk or low-risk for heart disease.

Dataset

The system is trained and evaluated using the Cleveland Heart Disease Dataset, which contains 14 clinical attributes, including:

Age
Sex
Chest pain type
Resting blood pressure
Cholesterol level
Fasting blood sugar
ECG results
Maximum heart rate
Exercise-induced angina
ST depression (Oldpeak)
Slope of ST segment
Number of major vessels
Thalassemia
Heart disease diagnosis (target variable)

System Design and Implementation

The architecture consists of five modules:

Data Acquisition Module
- Collects patient information.
Data Preprocessing Module
- Cleans and normalizes data.
Feature Selection Module
- Selects significant predictors.
Machine Learning and Stacking Module
- Trains multiple classifiers and combines their outputs.
Prediction and Output Module
- Generates final risk predictions.

The system is implemented using:

Python
Scikit-learn
XGBoost
Pandas and NumPy
Matplotlib and Seaborn
Streamlit for the web-based user interface

Results and Performance

The stacked hybrid model achieved an overall prediction accuracy of approximately 89–90%, outperforming individual machine learning models.

Key performance benefits include:

Higher accuracy.
Better generalization to unseen data.
Reduced prediction errors.
Faster prediction times suitable for real-time use.

Performance evaluation using precision, recall, and F1-score confirmed the model’s effectiveness in identifying both high-risk and low-risk patients.

Validation

The system was tested with multiple patient scenarios:

High-risk patient: Correctly classified with a confidence score of approximately 75.9%.
Low-risk patient: Correctly classified with a confidence score of approximately 62.6%.

These results demonstrate the model’s reliability and practical applicability in clinical environments

Conclusion

This paper presented an Advanced Heart Attack Risk Prediction System using a stacked hybrid machine learning approach. The system integrates multiple machine learning algorithms, including Random Forest, Decision Tree, KNN, SVM, Logistic Regression, XGBoost, and Gradient Boosting, combined through a stacking classifier with Logistic Regression as the meta-learner. By leveraging the complementary strengths of diverse algorithms, the stacked hybrid model achieved a prediction accuracy of approximately 89%, outperforming individual base classifiers. The system demonstrated effective data preprocessing, feature selection, and model training using the Cleveland Heart Disease Dataset. A user-friendly web interface was developed using Streamlit, enabling users to input patient data and obtain rapid, accurate prediction results. The system provides a non-invasive, cost-effective, and reliable method for early detection of heart disease, supporting healthcare professionals in making informed clinical decisions. However, certain limitations exist. The accuracy of the system depends on the quality and size of the dataset used for training. A limited number of health parameters are considered, and the system may not generalize equally well to unseen datasets from different populations. Additionally, the system is not currently integrated with real-time hospital information systems.

References

[1] World Health Organization (WHO), \"Cardiovascular Diseases (CVDs),\" Fact Sheet. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) [2] S. Raheja and N. Ray, \"Detection of heart disease using machine learning,\" in Proc. Int. Conf. on Artificial-Business Analytics, Quantum and Machine Learning, Singapore: Springer Nature, 2023, pp. 1–8. [3] P. Sharma, R. Gupta, and A. Kaur, \"Hybrid BiLSTM-GRU model for coronary heart disease prediction using randomized search cross-validation,\" Journal of Healthcare Engineering, vol. 2023, pp. 1–12, Apr. 2023. [4] P. Balakrishnan and R. Kumar, \"IoT-enabled cardiovascular risk prediction using recurrent convolutional neural networks and fuzzy C-means,\" IEEE Trans. on Industrial Informatics, vol. 19, no. 7, pp. 2345–2354, Jul. 2023. [5] B. Nandy, A. Dey, and D. Goswami, \"Swarm-ANN: A swarm intelligence-based artificial neural network for heart disease prediction,\" Applied Soft Computing, vol. 110, pp. 107677, Oct. 2021. [6] R. Elsedimy, S. Ibrahim, and M. Abdelghany, \"Quantum-behaved particle swarm optimization-support vector machine model for cardiovascular disease prediction,\" Int. J. of Computational Intelligence Systems, vol. 16, no. 4, pp. 239–254, Apr. 2023. [7] X. Cai, J. Li, and Y. Wang, \"Independent validation of AI cardiovascular risk models: A comprehensive review and development of independent validation score (IVS),\" Journal of Medical Systems, vol. 48, no. 1, pp. 12–28, Jan. 2024. [8] M. M. Islam, T. Nasrin, and A. Uddin, \"Real-time cardiovascular disease prediction system using IoT and machine learning,\" Journal of Healthcare Informatics Research, vol. 7, no. 3, pp. 285–302, Sep. 2023. [9] A. Hossain, M. Miah, and M. H. Kabir, \"Feature selection in random forest models for accurate heart disease prediction,\" Computers in Biology and Medicine, vol. 153, pp. 106415, Aug. 2023. [10] E. K. Dritsas and M. Trigka, \"Ensemble machine learning for heart disease prediction with SMOTE: Addressing class imbalance in medical data,\" Medical Informatics and Decision Making, vol. 23, no. 5, pp. 89–103, Nov. 2023.

Copyright

Copyright © 2026 M. Vijaya Kumar, D. V. V. Manikanta Raju, A. D. M. Praveen, P. Baladithya, V. Sai Dinesh Kumar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET83687

Publish Date : 2026-06-14

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here