Credit risk assessment is a critical component of financial decision-making systems. While complex machine learning models often achieve superior predictive performance, they lack interpretability, which is essential for regulatory compliance and stakeholder trust in financial institutions. This study investigates the performance and explainability trade-offs among lightweight machine learning models, including Logistic Regression, Decision Trees, Random Forest, and XGBoost, for credit risk prediction. The models are evaluated using standard classification metrics such as Accuracy, Precision, Recall, F1-Score, and ROC-AUC. Additionally, model interpretability is examined using feature importance analysis and SHAP (SHapley Additive exPlanations). Experimental results demonstrate that while XGBoost achieves the highest predictive accuracy, Logistic Regression provides superior interpretability. Random Forest offers a balanced trade-off between performance and transparency.
This study highlights the importance of explainable AI in financial risk modeling and provides practical insights for deploying transparent and efficient machine learning systems in regulated environments.
Introduction
The text discusses credit risk prediction, which involves assessing the likelihood that a borrower will default on a loan. Financial institutions use predictive models to minimize losses, and traditionally, logistic regression has been favored for its simplicity, interpretability, and regulatory compliance.
With advances in machine learning, more complex models like Decision Trees, Random Forests, and XGBoost have improved predictive accuracy by capturing nonlinear patterns, but they are often “black boxes,” making them less interpretable and harder to justify in regulated financial contexts.
To address this, the study explores three key questions:
How lightweight machine learning models compare in predictive performance.
The trade-off between accuracy and interpretability.
The role of explainability techniques (like SHAP) in improving trust in ensemble models.
Decision Tree: Rule-based, interpretable but prone to overfitting.
Random Forest: Ensemble of trees, reduces variance and improves accuracy.
XGBoost: Gradient boosting, excellent predictive performance but less interpretable.
Evaluation Metrics: Accuracy, Precision, Recall, F1 Score, and ROC-AUC. Explainability: SHAP values quantify individual feature contributions to predictions.
Novel Contribution:
The study proposes an Accuracy–Interpretability Trade-off Score (AITS) to quantify the balance between model performance and interpretability. AITS combines normalized interpretability scores with accuracy, allowing regulators and practitioners to select models that achieve a practical compromise between transparency and predictive power. In this study, α = 0.7, giving slightly more weight to accuracy while still considering interpretability.
Conclusion
This study presents a comprehensive analysis of lightweight machine learning models for credit risk prediction, with a primary focus on understanding the trade-off between predictive accuracy and model interpretability. In high-stakes domains such as financial decision-making, achieving a balance between these two aspects is essential for both operational effectiveness and regulatory compliance.
Through systematic experimentation, it was observed that advanced ensemble methods, particularly XGBoost, deliver superior predictive performance across all evaluation metrics, including Accuracy, Precision, Recall, F1-score, and ROC-AUC. However, this improved performance comes at the cost of reduced transparency, making such models less suitable in environments where explainability is a strict requirement.
In contrast, Logistic Regression demonstrates strong interpretability due to its linear structure and easily understandable coefficients, although it shows comparatively lower predictive performance. Decision Trees provide intuitive rule-based explanations but suffer from instability and a tendency to overfit. Random Forest emerges as a balanced alternative, offering improved predictive capability over individual trees while maintaining a moderate level of interpretability.
A key aspect of this study is the integration of SHAP-based explainability techniques, which enable both global and local interpretation of model predictions. By quantifying feature contributions, SHAP enhances the usability of complex models, allowing stakeholders to better understand decision outcomes and increasing trust in machine learning systems deployed in financial contexts.
References
[1] L. Breiman, \"Random Forests,\" Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
[2] T. Chen and C. Guestrin, \"XGBoost: A Scalable Tree Boosting System,\" in Proc. ACM SIGKDD, 2016, pp. 785–794.
[3] S. Lundberg and S. Lee, \"A Unified Approach to Interpreting Model Predictions,\" in Advances in Neural Information Processing Systems, 2017.
[4] D. Hand and W. Henley, \"Statistical Classification Methods in Consumer Credit Scoring,\" Journal of the Royal Statistical Society, 1997.
[5] J. Brownlee, Machine Learning Mastery With Python, Machine Learning Mastery, 2016.
[6] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning, Springer, 2013.
[7] C. Molnar, Interpretable Machine Learning, Lulu.com, 2020.
[8] M. T. Ribeiro, S. Singh, and C. Guestrin, \"Why Should I Trust You? Explaining the Predictions of Any Classifier,\" in Proc. ACM SIGKDD, 2016, pp. 1135–1144.
[9] A. Bussmann, N. Giudici, D. Marinelli, and J. Papenbrock, \"Explainable Machine Learning in Credit Risk Management,\" Computational Economics, vol. 57, no. 1, pp. 203–216, 2021.
[10] S. Lessmann, B. Baesens, H.-V. Seow, and L. C. Thomas, \"Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring,\" European Journal of Operational Research, vol. 247, no. 1, pp. 124–136, 2015.