Making predictions about corporate bankruptcy is an important activity in the financial risk management has great impact to the economic sector and several individuals related to the corporate, investors, and employees included. And it meets the urgent need to predict corporate bankruptcy, especially with strongly skewed data sets. This trains and considers an innovative hybrid machine learning model to enhance the precision and dependability of bankruptcy forecasting. Two real-life datasets that were publicly available in Taiwan and Poland were combined to form a new strong dataset with an approximation of 49,000 samples. Although the simple models, such as Logistic Regression or the Random Forest did not manage that heavy imbalance, more sophisticated gradient boosting models XGBoost, LightGBM, CatBoost showed decent results. In order to increase the predictive power, and this is to suggest the hybrid models, such as LightGBM+ANN, XGBoost+ANN and CatBoost+ANN. One of the biggest contributions of this article is the use of Particle Swarm Optimization (PSO) not to tune the hyperparameters but instead to determine the best decision threshold that can counter the threats of the class imbalance effectively. The superiority of the proposed approach is evident based on the results of the experiments. LightGBM+ANN model with the PSO-optimal threshold was the highest-performing model with an Accuracy of 97.92 and Area Under the Curve (AUC) of 0.9542. A successful deployment of the then model was accomplished in Streamlit web application to be put to practical use. This prediction can be seen to affirm the hypothesis that the hybrid LightGBM+ANN architecture, with its PSO-based threshold optimization performance surpassing that of all the models, is a highly useful and robust solution to the problem of bankruptcy forecasting.
Introduction
The text presents a comprehensive study on corporate bankruptcy prediction, emphasizing its importance in financial risk management due to the widespread economic impact of business failures on employees, investors, creditors, and markets. Traditional bankruptcy prediction methods based on financial ratios and linear statistical models, such as the Altman Z-score, are increasingly inadequate in today’s volatile and complex economic environment, particularly when dealing with highly imbalanced datasets where bankrupt firms are rare.
To address these limitations, the study proposes a hybrid machine learning framework that combines gradient boosting models (XGBoost, LightGBM, and CatBoost) with an Artificial Neural Network (ANN) as a meta-learner. Two large, real-world datasets from Taiwan and Poland—merged into a standardized dataset of approximately 49,000 firms—were used to improve model generalizability across different economic contexts.
A key methodological contribution is the use of Particle Swarm Optimization (PSO) to optimize the decision threshold, rather than model hyperparameters, to enhance detection of bankrupt firms. This approach significantly improves recall, F1-score, and AUC, which are critical metrics for minority-class prediction. Experimental results show that baseline models achieved high accuracy but performed poorly in identifying bankrupt firms. In contrast, hybrid models with PSO-optimized thresholds, particularly LightGBM + ANN, delivered the best overall performance, achieving the highest AUC and substantially improved recall.
The best-performing model was deployed as an interactive Streamlit web application, allowing users to input financial ratios and receive real-time bankruptcy risk predictions. Overall, the study demonstrates that combining boosting techniques, neural networks, and threshold optimization provides a robust and scalable early-warning system for bankruptcy prediction, offering practical value to investors, financial institutions, regulators, and corporate risk managers.
Conclusion
This examined a number of machine learning models to predict bankruptcy based on a large and heterogeneous dataset formed by pooling Taiwan and Poland financial records. The LightGBM+ANN hybrid model with the best thresholding and parameter tuning were the best models among all the models and optimization techniques that were tested. It has the best AUC 0.9542 and accuracy up to 0.9792, which is much better than other advanced hybrid models, including XGBoost+ANN and CatBoost +ANN, default and ensemble classifiers, including Random Forest and SVM. The presence of high scores in numerous measures, namely, sensitivity, F1-score, and specificity, shows that LightGBM+ANN proves to be quite effective in distinguishing between bankrupt and non-bankrupt companies. The implementation of optimization methods, the Particle Swarm Optimization (PSO), further promoted the model reliability and predictability of the model to a complex, imbalanced dataset. And results support the idea that, in working with more complex real-world financial data, combining strong gradient boosting methods with neural networks may be able to offer the robustness and plasticity. Merged dataset also enhanced generalizability which means that the model can be applied in the cross country other than just in one market.
References
[1] Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609.
[2] Beaver, W. H. (1966). Financial ratios as predictors of failure. Journal of Accounting Research, 4, 71–111.
[3] Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 18(1), 109–131.
[4] Andrade, G., & Kaplan, S. N. (1998). How costly is financial (not economic) distress? Evidence from highly leveraged transactions that became distressed. The Journal of Finance, 53(5), 1443–1493.
[5] Bris, A., Welch, I., & Zhu, N. (2006). The costs of bankruptcy: Chapter 7 liquidation versus Chapter 11 reorganization. Journal of Finance, 61(3), 1253–1303.
[6] Hotchkiss, E. S. (1995). Postbankruptcy performance and management turnover. Journal of Finance, 50(1), 3–21.
[7] Altman, E. I., & Hotchkiss, E. (2006). Corporate Financial Distress and Bankruptcy: Predict and Avoid Bankruptcy, Analyze and Invest in DistresseDebt.JohnWiley&Sons.
[8] Bellovary, J. L., Giacomino, D. E., & Akers, M. D. (2007). A review of bankruptcy prediction studies: 1930 to present. Journal of Financial Education,33,1–42.
[9] Basel Committee on Banking Supervision. (2004). International Convergence of Capital Measurement and Capital Standards (Basel II), BankforInternationalSettlements..
[10] Laeven, L., & Valencia, F. (2020). Systemic banking crises database II. IMF Economic Review, 68, 307–361.
[11] Hensher, D. A., Jones, S., & Greene, W. H. (2007). An error component logit analysis of corporate bankruptcy and insolvency risk. SSRN Electronic Journal.
[12] S&P Global. (2023). Global corporate defaults and rating transitions. S&P Global Ratings.
[13] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
[14] He, H., & Garcia, E. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
[15] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD, 785–794.
[16] Ke, G., et al. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 3149–3157.
[17] Zi?ba, M., Tomczak, S. K., & Tomczak, J. M. (2016). Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Systems with Applications, 58, 93–101. (Poland dataset)
[18] Yeh, C. C., & Lien, D. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473–2480. (Taiwan dataset)
[19] D. W. Hosmer, S. Lemeshow, and R. Sturdivant, Applied Logistic Regression, Wiley, 2013.
[20] L. Breiman, “Random forests,” Machine Learning, vol. 45, pp. 5–32, 2001.
[21] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. 22nd ACM SIGKDD, pp. 785–794, 2016.
[22] G. Ke et al., “LightGBM: A highly efficient gradient boosting decision tree,” NeurIPS, 2017.
[23] A. Dorogush, V. Ershov, and A. Gulin, “CatBoost: Gradient boosting with categorical features support,” NeurIPS Workshop, 2018.
[24] S. Wang et al., “A hybrid ensemble model based on neural network meta-learning for bankruptcy prediction,” Expert Systems with Applications,vol.140,2020
[25] M. Abellán and J. Castellano, “A nearest-neighbor-based neural meta-classifier for financial distress prediction,” Decision Support Systems, vol. 131, 2020.
[26] Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN, 1942–1948.
[27] Sahoo, A. K., & Chandrasekaran, M. (2021). A hybrid machine learning model for imbalanced classification using PSO-optimized threshold. Applied Intelligence, 51, 1130–1147.
[28] Chakraborty, S., & Joseph, A. (2017). Machine learning for finance: Predicting default risk. Journal of Risk Finance, 18(4), 365–383.
[29] Marqués, A. I., García, V., & Sánchez, J. S. (2012). Exploring the behavior of base classifiers in credit scoring ensembles. Expert Systems with Applications, 39(11), 10244–10250.
[30] Zhang, L., Wang, S., & Liu, B. (2017). Machine learning methods for bankruptcy prediction using Chinese-listed companies. International JournalofExperimentalAlgorithms(IJEA),6(2),1–12
[31] Dasilas, A., Papadamou, S., & Siriopoulos, C. (2024). Machine learning techniques in bankruptcy prediction: A systematic literature review (2012–2023). Expert Systems with Applications, 236, 121278.
[32] Ainan, U. H., Por, L. Y., Chen, Y. L., Yang, J., & Ku, C. S. (2024). Advancing bankruptcy forecasting with hybrid machine learning techniques: Insights from an unbalanced Polish dataset. Applied Soft Computing, 154, 110123.
[33] Ansah-Narh, T., Nortey, E. N. N., Proven-Adzri, E., & Opoku-Sarkodie, R. (2024). Enhancing corporate bankruptcy prediction via a hybrid genetic algorithm and domain adaptation learning architecture. Engineering Applications of Artificial Intelligence, 130, 107861.
[34] Papík, M., Smatana, M., Misak, S., & Hvolka, J. (2025). The possibilities of using AutoML in bankruptcy prediction: Case of Slovakia. Computers in Industry, 162, 104020.
[35] Wang, X., Li, H., Chen, Q., & Zhao, T. (2024). Datasets for advanced bankruptcy prediction: A comprehensive survey and taxonomy. Decision Support Systems, 175, 114041.
[36] Aly, S., El-Bakry, H. M., & Abd El-Razek, S. (2022). Developing intelligent bankruptcy systems using machine learning with imbalanced datasets. International Journal of Intelligent Systems and Applications, 14(3), 1–16.
[37] Zi?ba, M., Tomczak, S., & Tomczak, J. (2016). Ensemble boosted trees for bankruptcy prediction. Expert Systems with Applications.
[38] Kumar, P., & Ravi, V. (2007). Bankruptcy prediction in banks and firms using soft computing techniques: A survey. European Journal of Operational Research
[39] Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. Expert Systems with Applications.
[40] Du, M., Li, F., & Zheng, G. (2020). Hybrid machine learning model for corporate bankruptcy prediction. Applied Intelligence machine learning model for corporate bankruptcy prediction. Applied Intelligence.