Early detection of liver disease is very important because it can help reduce the risk of liver problems. Supervised Machine Learning (ML) models help in predicting liver disease, but their performance depends on how they are evaluated and improved. In this research study, we examine different supervised learning models to understand how accurately they can predict liver disease. in this study, we use multiple evaluation methods, such as precision, recall curves, and K-Fold cross-validation to test the models more effectively, especially when the dataset is unbalanced. Several ML algorithms like Logistic Regression, Random Forest, Support Vector Machine, and Decision Tree are trained using real medical data. Many existing studies rely heavily on a single metric, such as accuracy. Our results show that using different evaluation metrics gives a better understanding of each model\'s performance and helps improve prediction accuracy. Overall, the study shows that using a combination of evaluation techniques can lead to more reliable Machine learning models for liver disease diagnosis.
Introduction
Liver disease is a major global health concern where early detection is crucial, yet traditional diagnostic methods are often expensive and time-consuming. With the availability of structured healthcare data, machine learning (ML) offers an effective alternative for automated liver disease prediction. However, prior studies are limited by small or outdated datasets, weak validation strategies, and reliance on single models, reducing their reliability in clinical contexts.
This study proposes a more robust evaluation framework for liver disease prediction using supervised machine learning. It emphasizes the use of multiple performance metrics—accuracy, precision, recall, F1-score, and ROC-AUC—along with proper cross-validation to better assess model sensitivity, generalization, and clinical usefulness. The work focuses on predicting common liver conditions such as fatty liver, cirrhosis, and hepatitis.
Using the Indian Liver Patient Dataset (ILPD) from the UCI Machine Learning Repository, the methodology includes data collection, preprocessing (median imputation of missing values), feature selection, model development, and comprehensive evaluation. Several supervised algorithms—Logistic Regression, Support Vector Machine (SVM), Decision Tree, and Random Forest—are trained and compared.
Results show that the Random Forest model achieves the best overall performance, with the highest accuracy, precision, and F1-score, offering a strong balance between sensitivity and specificity. While SVM yields the highest recall and is most effective at identifying true disease cases, Random Forest provides the most reliable and balanced classification. Logistic Regression performs weakest, mainly due to higher false negatives.
Overall, the study demonstrates the importance of multi-metric evaluation and robust validation in medical ML applications and identifies Random Forest as the most effective model for liver disease prediction on the ILPD dataset.
Conclusion
In order to achieve successful outcomes in the healthcare industry, this research paper\'s main conclusion takes into account dataset quality, interpretability, and clinical integration when utilizing machine learning models, primarily Random Forest. When it comes to diagnosing liver diseases, Random Forest has demonstrated high accuracy. Due to poor performance with this dataset, SVM and Decision Tree may be a backup option.
Future research can explore the application of Artificial Neural Network (ANN) and Convolutional Neural Network(CNN). ANN can capture deeper nonlinear relationships within structured medical data, while CNN, used for image-based Analysis, can be CT scans. Advanced deep learning models may improve predictive performance and support decision-making in the healthcare domain.
References
[1] Dixon, S., & Yu, X. H. (2015, August). Liver disorder detection based on artificial immune systems. In 2015 11th International Conference on Natural Computation (ICNC) (pp. 743-748). IEEE.
[2] Vats, V., Zhang, L., Chatterjee, S., Ahmed, S., Enziama, E., & Tepe, K. (2018, December). A comparative analysis of unsupervised machine techniques for liver disease prediction. In 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) (pp. 486-489). IEEE.
[3] Rani, R., Jaiswal, G., Nancy, Lipika, Bhushan, S., Ullah, F., ... & Diwakar, M. (2025). Enhancing liver disease diagnosis with hybrid SMOTE-ENN balanced machine learning models—an empirical analysis of Indian patient liver disease datasets. Frontiers in Medicine, 12, 1502749.
[4] Singh, J., Bagga, S., & Kaur, R. (2020). Software-based prediction of liver disease with feature selection and classification techniques. Procedia Computer Science, 167, 1970-1980.
[5] Gupta, K., Jiwani, N., Afreen, N., & Divyarani, D. (2022, April). Liver disease prediction using machine learning classification techniques. In 2022 IEEE 11th International conference on communication systems and network technologies (CSNT) (pp. 221-226). IEEE.
[6] Sharshar, E. T., Amin, H., & Badr, N. (2023). Survey of Liver Fibrosis Prediction Using Machine Learning Techniques. International Journal of Intelligent Computing and Information Sciences, 23(2), 1-12.
[7] Kalaiselvi, R., Meena, K., & Vanitha, V. (2021, October). Liver disease prediction using machine learning algorithms. In 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA) (pp. 1-6). IEEE.
[8] Durai, V., Ramesh, S., & Kalthireddy, D. (2019). Liver disease prediction using machine learning. Int. J. Adv. Res. Ideas Innov. Technol, 5(2), 1584-1588.
[9] Tokala, S., Hajarathaiah, K., Gunda, S. R. P., Botla, S., Nalluri, L., Nagamanohar, P., ... & Enduri, M. K. (2023). Liver disease prediction and classification using machine learning techniques. International Journal of Advanced Computer Science and Applications, 14(2).
[10] Kodinariya, T. M., & Gondaliya, N. (2024). Machine learning application in liver disease prediction. Journal of Electrical Systems, 20(10s), 1835-1844.
[11] Ganie, S. M., Dutta Pramanik, P. K., & Zhao, Z. (2024). Improved liver disease prediction from clinical data through an evaluation of ensemble learning approaches. BMC Medical Informatics and Decision Making, 24(1), 160.
[12] Tripathi, A., Ragiri, P. R., Jain, D., & Yadav, T. (2025). Machine Learning-based Predictive Models for Early Diagnosis of Liver Disease: MACHINE LEARNING-BASED MODELS FOR EARLY DIAGNOSIS OF LIVER DISEASE. Journal of Scientific & Industrial Research (JSIR), 84(5), 575-583.
[13] Md, A. Q., Kulkarni, S., Joshua, C. J., Vaichole, T., Mohan, S., & Iwendi, C. (2023). Enhanced preprocessing approach using ensemble machine learning algorithms for detecting liver disease. Biomedicines, 11(2), 581.
[14] Weng, S., Hu, D., Chen, J., Yang, Y., & Peng, D. (2023). Prediction of fatty liver disease in a Chinese population using machine-learning algorithms. Diagnostics, 13(6), 1168.