Research Work on Predictive Premium Model of Medical Cost

Authors: Anurag Shrivastava, Kripa Shankar Pathak, Ayush Narayan, Muskan Chaudhary, Gaurav Ghildiyal

DOI Link: https://doi.org/10.22214/ijraset.2025.71763

Abstract

Insurance is a protection policy which minimizes or eliminates the impacts of expenses loss caused by different risks. A number of factors determine risk costs. These factors dictate how insurance packages are designed. The effectiveness of certain clauses within an insurance policy can be improved through the use of machine learning (ML). In this study, we utilize individual local health data for forecasting insurance amounts tailored to specific person groups. For the purpose of evaluating the performance of these algorithms, nine regression models were used: Linear Regression, XG Boost Regression, Lasso Regression, Random Forest Regression, Ridge Regression, Decision Tree Regression, KNN Model, Support Vector Regression, and Gradient Boosting Regression. The model was trained on the provided dataset, which included a portion of the data as training data. After training the model, it was tested against real data. The validation of the model was done by comparing the predicted data which was assumed abundant. After that the comparison was carried out between the accuracy of these models. We aim to provide some valuable insights for researchers, practitioners, and policymakers for effective decision-making in healthcare contexts by exploiting machine learning methodologies.

Introduction

The digital health industry has rapidly grown globally, doubling the number of companies in the past five years. Developed countries face challenges in health insurance due to rising healthcare costs and an increasing uninsured population. Governments have invested heavily in digital health to address these issues. Private health insurance plays a crucial role, especially for patients with rare diseases, as it helps reduce treatment costs.

Predicting medical expenditures is complex because many expenses come from patients with uncommon conditions. Machine learning (ML) and deep learning techniques are widely used for cost prediction, with accuracy and training time as key considerations. However, ML models often produce unreliable predictions, while deep learning’s potential is limited by lengthy training times in real-time applications.

Literature Review:
Past studies have applied various ML methods, such as logistic regression and XGBoost, for predicting insurance claims, with logistic regression favored for its interpretability. However, these studies often ignore claim costs and complexities in healthcare expenses. Clinical decision support systems struggle with incomplete or missing data and lack integration with the latest medical knowledge.

Methodology:

Dataset: A Kaggle dataset with 1338 records and 7 attributes (age, sex, children, BMI, region, smoking status, charges) was used.
Data Split: 80% training and 20% testing data.
Objective: Predict medical insurance costs based on demographic and health-related factors.

Technologies, Tools, and Techniques:

Technologies: Electronic Health Records (EHRs), Big Data platforms (Hadoop, Spark), and Cloud Computing (AWS, Azure) support data handling and ML model deployment.
Tools: Python, R, and ML frameworks like TensorFlow, PyTorch, and Scikit-learn aid in model building and analysis.
Techniques: Regression analysis, decision trees, ensemble methods, data preprocessing, model evaluation (accuracy, precision, recall), selection, and deployment are employed for building effective predictive models.

Conclusion

In order to forecast health insurance prices based on provided factors in a Kaggle site medical cost individual data set, the study combines ML regression models. Table IV is a list of the outcomes. By predicting insurance rates based on a variety of factors, insurance policy firms may attract consumers and save time. Machine learning may significantly reduce these individual efforts in price analysis since ML models can compute costs quickly while doing so would take a person a long time. Large volumes of data can also be handled via machine learning techniques. The work might be improved in the future by building a web application based on the XGBoost or Gradient Boosting algorithm and using a larger dataset than that used in this study.

References

[1] \"Digital Health 150: The Digital Health Startups Transforming the Future of Healthcare | CB Insights Research\", CB Insights Research, 2022. [Online]. Available: https://www.cbinsights.com/research/report/digital-health-startups-redefininghealthcare. [2] J. H. Lee, “Pricing and reimbursement pathways of new ophan drugs in South Korea: A longitudinal comparison. in healthcare,” Multidisciplinary Digital Publishing Institute, vol. 9, no. 3, pp. 296, 2021. [3] Gupta, S., & Tripathi, P. (2016, February). An emerging trend of big data analytics with health insurance in India. In 2016 International Conference on Innovation and Challenges in Cyber Security (ICICCS-INBUSH) (pp. 64-69). IEEE [4] N. Shakhovska, S. Fedushko, I. Shvorob and Y. Syerov, “Development of mobile system for medical recommendations,” Procedia Computer Science, vol. 155, pp. 43–50, 2019 [5] MedicalCost Personal Datasets: https://www.kaggle.com/datasets/mirichoi0218/insurance [6] J. Pesantez-Narvaez, M. Guillen, and M. Alcañiz, \"Predicting Motor Insurance Claims Using Telematics Data—XGBoost versus Logistic Regression, \" Risks, vol. 7, no. 2, p. 70, Jun. 2019, doi: 10.3390/risks7020070. [7] M. hanafy and O. Mahmoud, \"Predict Health Insurance Cost by using Machine Learning and DNN Regression Models\", International Journal of Innovative Technology and Exploring Engineering, vol. 10, no. 3, pp. 137-143, 2021. Doi: 10.35940/ijitee.c8364.0110321

Copyright

Copyright © 2025 Anurag Shrivastava, Kripa Shankar Pathak, Ayush Narayan, Muskan Chaudhary, Gaurav Ghildiyal. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET71763

Publish Date : 2025-05-28

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here