Anemia Prediction Using Machine Learning and NLP

Authors: K. Appanna, K. Tulasi Krishna Kumar

DOI Link: https://doi.org/10.22214/ijraset.2026.82778

Abstract

Anemia is a common health disorder caused by a deficiency of hemoglobin or red blood cells, leading to fatigue, weakness, and serious complications if not detected early. Early diagnosis is essential for effective treatment and prevention. This project presents a machine learning-based system designed to predict anemia using important patient health parameters such as hemoglobin levels, red blood cell count, mean corpuscular volume (MCV), and mean corpuscular hemoglobin (MCH). The dataset is carefully pre-processed to handle missing values and improve overall data quality, which helps enhance the accuracy of the prediction models. Various machine learning algorithms, including Random Forest, Logistic Regression, and Decision Tree, are applied to analyze the clinical data and classify whether an individual is anemic or not. These models are evaluated using performance metrics such as accuracy and precision, with Random Forest providing better results due to its ability to handle complex relationships among features. The system is further integrated into a user-friendly web interface that allows users to input medical data and receive instant predictions. Overall, this project demonstrates the significant role of artificial intelligence in healthcare by enabling early detection of anemia, reducing manual effort, and supporting medical professionals in making accurate and timely decisions.

Introduction

Anemia is a medical condition caused by low hemoglobin levels or reduced red blood cells, which leads to insufficient oxygen delivery in the body and symptoms such as fatigue, weakness, and dizziness. Early detection is important to prevent serious complications, and this study proposes a machine learning-based system to predict anemia using clinical features like hemoglobin level, RBC count, MCV, and MCH.

The system follows a standard data science pipeline involving data collection, preprocessing (handling missing values, normalization, and feature scaling), model training, and evaluation. Multiple algorithms—Logistic Regression, Decision Tree, and Random Forest—are applied and compared, with Random Forest performing best due to its ability to handle complex, non-linear relationships and reduce overfitting.

The best model is deployed in a Flask-based web application, allowing users to input patient data and receive real-time predictions.

Conclusion

The anemia prediction system demonstrates the practical application of machine learning in the healthcare domain for early detection and decision support. It uses key clinical parameters such as hemoglobin, RBC count, MCV, and MCH to analyze patient data and predict whether a person is anemic or not.In this project, multiple machine learning algorithms such as Random Forest, Logistic Regression, and Decision Tree are implemented and evaluated. Among these, Random Forest provides the highest accuracy due to its ensemble learning technique, while Logistic Regression and Decision Tree help in comparison and interpretability. This combination ensures both accuracy and better understanding of the prediction process.The system is deployed using the Flask web framework, which connects the machine learning model with a user-friendly web interface. This allows users to enter medical values easily and receive instant prediction results in real time. The simplicity of the interface makes the system accessible to both medical professionals and general users.This project helps in reducing manual workload, saving time, and improving the efficiency of medical diagnosis. It also ensures faster and more consistent results compared to traditional methods. However, the system’s performance depends on the quality, size, and balance of the dataset, and it may require further improvement for real-world clinical deployment.Overall, the project highlights the importance of artificial intelligence in healthcare and shows how machine learning can be used to build fast, scalable, and cost-effective solutions for early disease prediction and medical decision support.

References

[1] [S. Rajaraman, S. Candemir, and G. Thoma, “Visualizing and explaining deep learning predictions for pneumonia detection in chest radiographs,” Proc. IEEE Int. Conf. Comput. Vis. Workshops, pp. 1–9, 2018. [2] A. Esteva et al., “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, pp. 115–118, 2017. [3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 770–778, 2016. [4] O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015. [5] D. Shen, G. Wu, and H. Suk, “Deep learning in medical image analysis,” Annu. Rev. Biomed. Eng., vol. 19, pp. 221–248, 2017. [6] J. Litjens et al., “A survey on deep learning in medical image analysis,” Med. Image Anal., vol. 42, pp. 60–88, 2017. [7] M. Abadi et al., “Tensor Flow: Large-scale machine learning on heterogeneous systems,” 2016. [Online]. Available: [8] https://www.tensorflow.orgF. Chollet, “Keras: Deep learning library for Python,” 2015. [Online]. Available: https://keras.io [9] G. Huang, Z. Liu, L. van der Maaten, and K. Weinberger, “Densely connected convolutional networks,” Proc. IEEE CVPR, pp. 4700–4708, 2017. [10] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017. [11] S. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [12] C. Szegedy et al., “Going deeper with convolutions,” Proc. IEEE CVPR, pp. 1–9, 2015. [13] R. Miotto, F. Wang, S. Wang, X. Jiang, and J. Dudley, “Deep learning for healthcare: review, opportunities and challenges,” Brief. Bioinform., vol. 19, no. 6, pp. 1236–1246, 2018. [14] T. Rahman et al., “Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images,” Comput. Biol. Med., vol. 132, pp. 104319, 2021. [15] P. Kaur, G. Singh, and P. Kaur, “Intelligent diagnosis of anemia using machine learning techniques,” Proc. Int. Conf. Comput. Intell. Data Sci., pp. 1–6, 2020. [16] S. R. Dubey, S. K. Singh, and R. K. Singh, “Multimodal deep learning for medical diagnosis: A review,” IEEE Access, vol. 9, pp. 121001–121020, 2021. [17] H. Greenspan, B. van Ginneken, and R. Summers, “Guest editorial deep learning in medical imaging: Overview and future promise,” IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1153–1159, 2016. [18] J. Arevalo, F. González, R. Ramos-Pollán, J. Oliveira, and M. Guevara Lopez, “Representation learning for mammography mass lesion classification with deep learning,” Comput. Methods Programs Biomed. vol. 127, pp. 248–257, 2016. [19] M. Anthimopoulos et al., “Lung pattern classification for interstitial lung diseases using a deep CNN,” IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1207–1216, 2016.

Copyright

Copyright © 2026 K. Appanna, K. Tulasi Krishna Kumar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET82778

Publish Date : 2026-05-19

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here