Pre-emptive forecasting of myocardial infarction (MI) remains a clinical imperative; however, relying strictly on standard electrocardiogram (ECG) infrastructure limits prolonged, uninterrupted patient observation due to inherent logistical and financial constraints. To overcome these limitations, non-invasive sleep-derived physiological signals, such as heart rate and respiratory rate, offer a highly practical alternative for continuous risk stratification. However, applying machine learning to such medical data often encounters severe class imbalance. To address this, our pipeline leverages SMOTE for synthetic data generation, achieving class equilibrium while preserving all original baseline subject profiles. We propose a novel stacked ensemble framework that integrates five diverse base classifiers—Multilayer Perceptron, XGBoost, LightGBM, Random Forest, and Support Vector Machine (SVM) —whose outputs are unified by a Logistic Regression meta-learner to maximize predictive robustness. Evaluated on a refined 924-patient dataset, the proposed architecture achieved an exceptional accuracy of 95.68% and a high recall of 0.96 for MI cases, effectively minimizing critical false negatives. These findings demonstrate that combining synthetic sampling with a heterogeneous ensemble architecture provides a highly accurate, scalable, and cost-effective system for continuous early MI prediction using non-ECG sleep data.
Introduction
MI is a life-threatening condition caused by blocked blood flow to the heart, and early prediction is critical. However, conventional ECG-based diagnosis is limited because it is hospital-based, intermittent, and not suitable for continuous monitoring. The study instead uses sleep-based signals such as heart rate and respiratory rate, which provide continuous, natural, and low-cost physiological monitoring.
To improve prediction accuracy, the research addresses two major challenges in medical datasets: class imbalance and limitations of single machine learning models. It introduces a stacked ensemble framework combined with SMOTE (Synthetic Minority Oversampling Technique) to balance the dataset without losing information.
The proposed system uses multiple base models—MLP, XGBoost, LightGBM, Random Forest, and SVM—and combines their outputs using a Logistic Regression meta-learner. The dataset is preprocessed, split into training/testing sets, and balanced before model training. Feature importance analysis is also used to improve interpretability.
The dataset includes 924 patient records with 33 physiological features, and the original data is imbalanced between healthy and MI cases. After applying SMOTE and ensemble learning, the model achieves strong performance, with an accuracy of about 95.68% and high recall (0.96), indicating strong ability to correctly detect MI cases.
Conclusion
In summary, this research delivers a highly interpretable framework designed to pre-emptively identify infarction risks through the continuous monitoring of nocturnal bio-signals. This framework was built to systematically address fundamental limitations that are commonly encountered in medical machine learning applications. To rectify the inherently skewed patient distributions, our pipeline utilized synthetic minority generation, guaranteeing a mathematically equalized learning space. Model transparency was achieved by incorporating feature importance analysis, which maps the role of each physiological metric in making a prediction. These components work in concert to produce a system that is both technically sound and practically applicable. To isolate the optimal predictive engine, half a dozen distinct mathematical architectures underwent rigorous comparative testing. The Stacked Ensemble framework achieved the highest performance, delivering superior accuracy (95.68%) alongside optimized recall, precision and F1-score. These findings indicate that addressing class imbalance while fusing diverse algorithmic approaches significantly enhances the model\'s capacity to learn. The addition of feature importance analysis further strengthens the system by giving clinicians a clear understanding of how individual biological attributes influence each diagnostic outcome, thereby increasing trust and usability in real medical environments.
Ultimately, integrating synthetic oversampling with a heterogeneous predictive architecture significantly enhances the reliability of computerized cardiovascular risk identification. Looking ahead, several directions exist for extending this work. Testing the framework on larger and more demographically diverse datasets would strengthen confidence in its generalizability. Incorporating complementary data modalities, such as daytime wearable tracking or continuous ECG alongside sleep features, could provide a richer diagnostic signal and further boost performance. Ultimately, translating this pipeline into a real-time clinical system would be a major step, allowing quick, low-cost, and non-invasive screening of MI directly in healthcare settings.
References
In summary, this research delivers a highly interpretable framework designed to pre-emptively identify infarction risks through the continuous monitoring of nocturnal bio-signals. This framework was built to systematically address fundamental limitations that are commonly encountered in medical machine learning applications. To rectify the inherently skewed patient distributions, our pipeline utilized synthetic minority generation, guaranteeing a mathematically equalized learning space. Model transparency was achieved by incorporating feature importance analysis, which maps the role of each physiological metric in making a prediction. These components work in concert to produce a system that is both technically sound and practically applicable. To isolate the optimal predictive engine, half a dozen distinct mathematical architectures underwent rigorous comparative testing. The Stacked Ensemble framework achieved the highest performance, delivering superior accuracy (95.68%) alongside optimized recall, precision and F1-score. These findings indicate that addressing class imbalance while fusing diverse algorithmic approaches significantly enhances the model\'s capacity to learn. The addition of feature importance analysis further strengthens the system by giving clinicians a clear understanding of how individual biological attributes influence each diagnostic outcome, thereby increasing trust and usability in real medical environments.
Ultimately, integrating synthetic oversampling with a heterogeneous predictive architecture significantly enhances the reliability of computerized cardiovascular risk identification. Looking ahead, several directions exist for extending this work. Testing the framework on larger and more demographically diverse datasets would strengthen confidence in its generalizability. Incorporating complementary data modalities, such as daytime wearable tracking or continuous ECG alongside sleep features, could provide a richer diagnostic signal and further boost performance. Ultimately, translating this pipeline into a real-time clinical system would be a major step, allowing quick, low-cost, and non-invasive screening of MI directly in healthcare settings.