Lung cancer is one of the most common and deadly cancers worldwide. One of the most effective ways to fight canceris todiscoveritearlyenoughtoimprovethepatient’s chances ofsurvival.TheDiscovery oflung canceratanearly stage helpsin reducing its risk. Various technologies like MRI, isotopes, X-rays, and CT scans are used for diagnosis of lung cancer. The studying of lung nodules helps a doctor to determine if the patient is malignant. These nodules sometimes have a chance of growing undetectedbythenakedeye. In thisproject,Lungcancer stageis detectedwith the help of patientdetails, symptomsand CT scans by using Machine learning and Deep learning algorithms with open-source datasets. The proposed approach uses Machine learning algorithms to study past medical records and determine if the patient has lung cancer. Deep learning models are used to analyze the CT scans to determine the stage of lung cancer. Themajorgoal of thisproject isto find nodules as small as 3 mm to detect cancer stage accurately. Finally, the machine learning model calculates the patient’s estimated medical insurance costs. This project is useful for the early detection of lung cancer in individuals and can help them in overcomingthese health conditions. The effectiveness of cancer prediction systems helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status.
Introduction
Lung cancer is one of the most deadly and widespread cancers worldwide, causing millions of deaths each year. Early detection is critical, as symptoms often appear only in advanced stages, making timely treatment difficult. This study focuses on the early detection, classification, and prediction of lung cancer and its stages using machine learning and deep learning techniques, along with medical insurance cost estimation.
The literature survey highlights that AI-based approaches such as Random Forest, CNNs, UNet, SVM, and ANN have shown promising accuracy in lung cancer detection using symptoms, CT scans, and radiomic features, though overall performance in healthcare applications is still moderate and requires improvement.
The proposed system is a comprehensive web-based application divided into six stages:
Lung cancer detection based on symptoms using a Random Forest Classifier.
Lung cancer classification from CT scans (normal, benign, malignant) using a CNN.
Lung nodule detection using a UNet deep learning model.
Medical insurance cost prediction using a Random Forest Regressor.
Data analysis and visualization using Plotly.
Web application integration using the Flask framework for real-time user interaction.
The methodology includes detailed data analysis, preprocessing, balancing (SMOTE), feature exploration, and model training. Multiple models were evaluated, and the best-performing ones were selected for deployment. CNN achieved about 92% accuracy in CT scan classification, UNet achieved around 98% accuracy in nodule detection, Random Forest Classifier performed best for symptom-based detection, and Random Forest Regressor provided the most accurate insurance cost predictions.
Experimental results confirm that the proposed models outperform several existing approaches. The integrated Flask web application delivers accurate results efficiently, supporting real-time lung cancer detection, classification, nodule localization, and cost estimation. Overall, the system demonstrates that AI-driven, multi-stage diagnostic platforms can significantly improve early lung cancer detection, decision support, and patient care.
Conclusion
The goal of this project was to create a method for overcoming the challenges of lung cancer by utilizing Machine Learning and Deep Learning techniques to predict the presence of cancer in the lungs using medical records, as well as interpret CT images to accurately identify nodules with diameters as small as 3mm at a low cost and in less time. Along with that, to be able to make this technology available to everyone in the form of a web application. All of the objectives were met, as evidenced by the following outcomes.RandomForestClassifierwith96.9\\%accuracyforsymptom-basedrecognitionandRandomForestRegressorwith
86.3\\%accuracyfor predictingmedicalinsurancecosts. TheCNN model, which was created toanalyzeCTimages,Theaccuracyof the CNN model used to analyze CT images was 92.42%. Finally, the UNet model designed to detect nodules on CT scansperformed excellently, with a 98% accuracyrate. The developed strategyis, in general, highlydependable for users.
References
[1] Muthazhagan, B., Ravi, T., &Rajinigirinath, D.: An enhanced computer-assisted lung cancer detection method using content-based image retrieval and datamin¬ing techniques. Journal of Ambient Intelligence and Humanized Computing, 2:1-9, 2020.
[2] Masud,M.,Sikder,N.,Nahid,A.A.,Bairagi,A.K.,&AlZain,M.A.:Amachinelearningapproachtodiagnosinglungandcoloncancerusingadeeplearning-based classi¬fication framework. Sensors, 21(3):748, 2021.
[3] Sajja,T.,Devarapalli,R.,&Kalluri,H.:LungCancerDetectionBasedonCTScanImagesbyUsingDeepTransferLearning.TraitementduSignal,36(4):339-44,2019.
[4] Tripathi,P.,Tyagi,S.,&Nath,M..:AComparativeAnalysisofSegmentationTechniquesforLungCancerDetection.PatternRecognitionandImageAnalysis, 29. 167-173, 2019.
[5] Nasrullah, N., Sang, J., Alam, M. S., Mateen, M., Cai, B., & Hu, H.: Automated lung nodule detection and classification using deep learning combined withmultiple strategies. Sensors, 19(17):3722, 2019.
[6] BhatiaS,SinhaY,GoelL.:Lungcancerdetection:adeeplearningapproach.InSoftComputingforProblemSolving.Springer,Singapore699-705(019
[7] Makaju, S., Prasad, P. W. C., Alsadoon, A., Singh, A. K., &Elchouemi, A.: Lung cancer detection using CT scan images. Procedia Computer Science, 125,107-114,2018.
[8] Ali I, HartGR, Gunabushanam G,LiangY,MuhammadW,NartowtB,KaneM,MaX,Deng J.:Lungnoduledetectionviadeep reinforcementlearning.Frontiers in oncology, 16;8:108, 2018.
[9] Nasser,I.M.,&Abu-Naser,S.S.:Lungcancerdetectionusingartificialneuralnet¬work.InternationalJournalofEngineeringandInformationSystems(IJEAIS), Mar;3(3):17-23, 2019.
[10] Choi,W.,Oh,J.H.,Riyahi,S.,Liu,C.J.,Jiang,F.,Chen,W.,...&Lu,W.:Radiomics analysisofpulmonarynodules inlow?doseCTforearlydetectionoflungcancer. Medicalphysics,45(4):1537-49,2018.
[11] Kadir,T.,&Gleeson,F.:Lungcancer predictionusingmachinelearningandadvancedimagingtechniques.Translationallungcancerresearch,7(3):304,2018.
[12] Raoof, S.S., Jabbar, M.A., & Fathima, S.A.: LungCancer predictionusing machine learning: A comprehensiveapproach. In: 2nd International conference oninnovative mechanisms for industry applications (ICIMIA). IEEE, 2020.
[13] Xie,Y.,Meng,W.Y.,Li,R.Z.,Wang,Y.W.,Qian,X.,Chan,C.,Yu,Z.F.,Fan,X.X.,Pan,H.D.,Xie,C.,Wu,Q.B.,Yan,P.Y.,Liu,L.,Tang,Y.J.,Yao,
X. J., Wang, M. F., & Leung, E. L.: Early lung cancer diagnostic biomarker discovery by machine learning methods. Translational oncology, 14.1: 100907,2021.
[14] Singh, G.A., & Gupta, P.: Performance analysis of various machine learning-based approaches for detection and classification of lung cancer in humans.Neural Computing and Applications 6863-6877, 2018.
[15] Shin, H., Oh, S., Hong, S., Kang, M., Kang, D., Ji, Y. G., ... & Choi, Y.: Early-stage lung cancer diagnosis by deep learning-based spectroscopic analysis ofcirculating exosomes. ACS nano, 14(5), 5435-5444, 2020.
[16] Hosny, A., Parmar, C., Coroller, T. P., Grossmann, P., Zeleznik, R., Kumar, A., ... & Aerts, H. J.: Deep learning for lung cancer prognostication: aretrospective multi-cohort radiomics study. PLoS medicine, 15.11, 2018.
[17] Lakshmanaprabu,S. K., Mohanty, S. N., Shankar, K., Arunkumar, N., & Ramirez, G.: Optimal deep learning model for classification of lung cancer on CTimages. Future Generation Computer Systems, 92: 374-382, 2019.
[18] deCarvalho Filho, A. O., Silva, A. C., de Paiva, A.C., Nunes, R. A., &Gattass, M.: Classification ofpatternsof benignityandmalignancybased onCT usingtopology-based phylogenetic diversity index and convolutional neural network. Pattern Recognition, 81, 200-212 (2018)
[19] da Nóbrega,R. V. M.,Rebouças Filho,P.P.,Rodrigues, M.B., da Silva, S. P.,DouradoJúnior, C. M., & de Albuquerque, V.H.C.: Lungnodule malignancyclassification in chest computed tomography images using transfer learning and convolutional neural networks. Neural Computing and Applications, 32(15),11065-11082,2020.
[20] Masood, A., Sheng, B., Li, P., Hou, X., Wei, X., Qin, J., & Feng, D.: Computer-assisted decision support system in pulmonary cancer detection and stageclassification on CT images. Journal of biomedical informatics, 79, 117-128, 2018.
[21] Sang, J., Alam, M. S., & Xiang, H.: Automated detection and classification for early stage lung cancer on CT images using deep learning. In PatternRecognition and Tracking XXX (Vol. 10995, p. 109950S). International Society for Optics and Photonics, 2019.
[22] Shan, H., Wang, G., Kalra, M. K., de Souza, R., & Zhang, J.: Enhancing transferability of features from pretrained deep neural networks for lung noduleclassification. In Proceedings of the 2017 International Conference on Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine,2017.
[23] Hanafy, Mohamed.: Predict Health InsuranceCost by using Machine Learningand DNN Regression Models. International Journal of Innovative Technologyand Exploring Engineering. Volume-10. 137, 2021.
[24] Iqbal, J., Hussain, S., AlSalman, H., Mosleh, M. A., & Sajid Ullah, S.: A Computational Intelligence Approach for Predicting Medical Insurance Cost.Mathematical Problems in Engineering, 2021.