Authors: Akhil Kumar S, Sai Krishna Gangyada, Gifty Joyce Immadi, Venkateswara Reddy H
DOI Link: https://doi.org/10.22214/ijraset.2023.49552
Certificate: View Certificate
Lung cancer is the third most terrible cancer in the world. It has been among the leading causes of death in both adults and children from recent years. Early cancer detection can lead to better treatment options for patients. The focus of conventional feature extraction methods is either on low -level features or high-level features, with some manually created features being utilized to fill in the gaps. A feature extraction framework does not require handcrafted features can be created to close this gap via encoding/combining low-level and high-level characteristics. Due to its ability to fully explain both low-level and high-level information and its integration of the feature extraction stage into the self-learning process, deep learning is incredibly effective for feature representation. Consequently, in this study, we use deep learning algorithms to detect the presence of lung cancer without the need for several doctor consultations. A web application is created as a healthcare application where lung cancer is detected using an input x-ray image. To identify cancer, the implementation uses the VGG-16 classification algorithm. As a result, the presence of the disease can be predicted early. We can then take quick action to stop any additional repercussions, which saves time, money, and human error. Lung cancer and its presence are identified in this investigation.
Lung cancer is one of the factors contributing to a rise in the death rate. Therefore, appropriate measures should be taken to detect and identify this illness in its early stages in order to preserve the lives of a huge number of lung cancer patients. The survival rate of many people can be increased if it is discovered and treated in the early stages. If a condition has been identified, a correct diagnosis can help patients live longer.
In order to achieve an acceptable and quick result, it is crucial to utilize contemporary machine learning techniques in the field of medical image processing by increasing the number of duplications. Medical imaging including chest X-rays, MRI scans, computed tomography, and others can be used to detect lung cancer. Machine learning algorithms can identify the main characteristics of complicated datasets containing lung cancer. Early in the 1980s, a CAD (Computer-Aided Diagnosis) was created to increase efficiency and support clinicians in evaluating medical images. Decision trees, linear regression, random forests, SVM, naive Bayes, K-nearest neighbours, and other machine learning algorithms are a few of the algorithms that have a significant impact on the health care industry.
We also covered deep learning approaches, methodologies, and algorithms that can be used for cancer diagnosis, detection, and prediction.
The major goal of this research project is to give a clear overview of current research on lung cancer prediction utilizing deep learning and machine learning models, in particular.
The location and size of the tumour are used to categorize the symptoms . It might be challenging to study and identify in the early stages because, in certain situations, there may be no pain or other symptoms. Patients with lung cancer may experience symptoms such as coughing up blood, chest pain, shortness of breath, wheezing, Pancoast syndrome (shoulder pain), hoarseness (vocal cord paralysis), weight loss, weakness, and fatigue.
Smoking causes 90% of all lung cancers. Passive smoking—the inhalation of tobacco smoke—also increases the risk of lung cancer. Heredity is another risk factor for lung cancer. The second-leading factor in the development of lung cancer fatalities is air pollution from vehicles, factories, and the ingestion of hazardous chemicals like radon.
II. LITERATURE SURVEY
Multilayer feed forward in order to identify cancer from microarray data and UCI machine learning data, neural networks were used. The model was trained using the back propagation rule. . To evaluate the accuracy of three classifiers that can identify early-stage lung cancer, including K-Nearest Neighbour (KNN), Support Vector Machine (SVM), and Convolutional Neural Network (CNN). The outcomes reveal that Support Vector Machine (SVM) provides the highest level of accuracy . Convolutional Neural Networks (CNN), Support Vector Machines (SVM), Artificial Neural Networks (ANN), Multi-Layer Perceptron’s (MLP), K-Nearest Neighbour (KNN), and the Entropy Degradation Method (EDM) are just a few of the machine learning techniques that are discussed in detail along with their accuracy, sensitivity, and specificity. The CNN method used in this study, which used a small dataset, produced the best results . Analyses three different kernel functions for estimating the likelihood that someone who has lung cancer will survive using a successful normalization technique. The Cancer Imaging Archive dataset is utilized in the investigations (TCIA). Together with SVM kernel functions, five more machine learning techniques were used to calculate the lung cancer survival rate. High accuracy was produced using normalized data and RBF SVM . The significance of the pre-malignant stage in the early identification of lung cancer is brought to light by this study. Processing and cleaning are done on the actual, inconsistent data. The SVM classifier's precision could be distinguished by its high prediction accuracy and distinguishing signature .
III. PROPOSED WORK
Convolutional Neural Network which is also called as (CNN) is used to categorise the data. CNN models VGG16 and VGG19 are suggested for the dataset's categorization. Training data, validation data, and test data are the three categories into which the dataset is separated. The model is trained using training data that was retrieved from the original data. The model trains itself using the input data's training images. The training process is validated, and the validation accuracy is determined, using validation data from the original data. The correctness of the model is evaluated using test data derived from the original data, and the model is tested using unidentified data. A dataset is imported once a convolutional neural network model has been created. Lung X-ray pictures are pre-processed in the first step by using filters to reduce degradation during acquisition. Lung areas are then taken out of the X-ray picture scans. Each slice is segmented in order to spot abnormalities. The classifier then determines whether the tumour present in a patient's lung is malignant or not by using the segmented tumours as input.
A. Data Collection
The data is divided into three folders (train, test, and val), with subfolders for the Pneumonia/Normal image categories. There are 5,863 JPG X-Ray images in the pneumonia and normal categories. The Guangzhou Women and Children's Medical Center's retrospective cohorts of paediatric patients between the ages of one and five were used to choose nine anterior-posterior chest X-ray images. The patients underwent normal medical treatment, which included 23 chest X-rays. Initially, the quality of each chest radiograph was inspected; any that were blurry or difficult to read were removed. Before the AI system reached level 10, two experts evaluated the pictures' diagnosis. Images are in jpg or png format rather than dcm format to suit the model. Pneumonia, a kind of chest cancer, and a folder for normal cells are both included in the data. The primary folder that contains all the phase folders is called the data folder. Phase folders are test, train, valid test represent testing set train represent training set valid represent validation set training set is 70 percent testing set is 20 percent validation set is 10 percent.
B. Transfer Learning
Transfer learning method of machine learning where a pre-trained model can be utilized as a starting point for a new, related issue. Transfer learning is an improvement that enables quick development or better performance when modelling the second task. Transfer learning is the process through which a model learns by solving one challenge and then applies that information to another similar activity. Pre-trained models save a significant amount of time and computing resources compared to training new models from scratch since they already have learnt representations that can be utilized as a foundation for new problems. The use of transfer learning has proved effective in several disciplines, including speech recognition, natural language processing, and picture categorization. It has assisted in enhancing model performance on novel tasks while lowering the demand for significant quantities of labelled data and processing resources. Transfer learning makes it possible to train models more quickly and effectively while also improving performance on brand-new jobs with less data. It has been utilized in several applications, including medical image analysis, fraud detection, and recommendation systems. It has applications in a broad range of domains, including computer vision, natural language processing, and speech recognition.
C. Convolutional Neural Network
A typical form of artificial neural network used for image categorization, object recognition, and other computer vision applications is the convolutional neural network (CNN). A CNN is made up of many layers of linked nodes, each of which does a particular operation on the incoming data. A convolutional layer serves as the initial layer of the network and typically applies a series of filters to the input picture in order to extract features. One or more pooling layers are then applied after the output of the convolutional layer to down sample the picture and make it smaller while keeping the crucial details. The output is generally flattened into a 1D vector after the convolutional and pooling layers and fed through one or more fully connected layers, which carry out a classification or regression job on the features retrieved from the earlier layers. The purpose of CNNs is to eliminate the requirement for human feature extraction by automatically learning and extracting features from the raw input data. While the characteristics required for classification or localization may be complicated and challenging to describe manually, they are ideally suited for tasks like object detection and picture recognition. CNNs are widely employed in many different applications, including facial recognition, medical imaging, and self-driving automobiles.
This is done to raise the standard of the raw MRI images and transform them into a form, suitable for processing by humans or machines. This step also helps in removing undesired noise and enhancing overall appearance of the X-RAY images. Image pre-processing involves steps such as creating functions to load image datasets into arrays, resizing raw images to an established base size before feeding it to the neural network, applying normalization to rescale the pixel values so they lie within a fixed range, data augmentation to increase the size of the dataset if insufficient number of images are available, among other steps These pre-processing tasks help improve classification accuracy and also speed up the training process. Image pre-processing involves steps such as creating functions to load image datasets into arrays, resizing raw images to an established base size before feeding it to the neural network, applying normalization to rescale the pixel values so they lie within a fixed range, data augmentation to increase the size of the dataset if insufficient number of images are available, among other steps. These pre-processing tasks help improve classification accuracy and also speed training processes.
E. Data Processing
We have implemented our model using VGG16 Convolutional Neural Network in Keras on chest x-ray dataset. Deep learning is a type of machine learning that is inspired by how neurons and synapses in the brain transmit information. A perceptron, which was conceived by Frank Rosenblatt in 1957 and can be seen as a simple linear classifier, is the basic unit of a neural network. Biologically, depending on the intensity of the action potential, neuron transfers nerve impulses to neighbouring cells. The nucleus (cell body), dendrites, axon, and axon terminal are all parts of a neuron. Dendrites are analogous to signal receivers; nuclei are analogous to signal processors; and axon and axon terminals are analogous to signal transmitters to surrounding neurons. Synapse is the junction between one neuron’s axon and another’s dendrites.
F. Data Annotation
Data Annotation is the process of pre-processing the data so that it can be used for machine learning algorithm directly to get adequately trained for correct prediction. The more the image is annotated data, the better will be the level of accuracy of the model in deep learning. As we need to identify the segmented region of the brain, we manually annotated the images based on the cancer predicted position. We annotated the images.
G. Data Normalization
We have normalized pixel value of the images in between 0 and 1 so the when we apply neural network it converges faster. The function for normalization is Z = (x − min(x))/(max(x) − min(x))
H. Training And Testing Set
CNN is a type of deep neural network that focuses solely on data collecting and is not labelled. Visual imagery analysis is the most prevalent use. CNN needs a lot less pre-processing than other image classification methods do. Yet getting precise results is difficult. For the purpose of quickly identifying lungs, many photographs are irrelevant. Even when the best input data has been selected, pre-processing is still necessary for the neural network to give accurate outputs. By lowering the quantity of inputs, this makes it easier for the network to learn. It clears the X-ray of unwanted impulses. images. Colour photos are converted to grey-level coding. A DenseNet architecture is suggested here for classifying lung cancer images. DenseNet is a densely connected convolutional neural network made up of numerous dense blocks with dense connection and transition layers. Unlike standard architectures, which only add L layer connections, a dense block of L layer introduces L (L + 1)/2 connections. Between any two layers, there is a direct connection. The input to each layer of the network is the sum of all previous layers’ outputs, and the feature maps learned by this layer are immediately sent as input to all following layers. a thick block through which feature maps can be concatenated All of the previous feature-maps are included in the lth layer’s l inputs.
IV. IMPLEMENTATION & RESULTS
The original data set is divided into two parts: a test set with 20% of the data and a training set with the remaining data. The train set is then split again, with 20% of the train set serving as a validation set and the remainder being used for training. The training set is 64 percent of the whole data set, the validation set is 16 percent, and the test set is 20 percent. The three-layered Neural networks are fed the training data set, with the first two layers each having four nodes and the output layer having only one node. The history object stores the model’s loss and accuracy data for each epoch. We used different activation functions to improve the accuracy of the model so we decided to apply 4 activation function and we noticed the changes in the accuracy of the model, out of five we got Binary sigmoid which is best fit activation function for our model.
The software below creates a Deep Learning Binary Classification Model.
The information is divided into three categories:
The train data will be used to train the model, and the validation model will be used to evaluate its fitness. After each run, users may change hyperparameters like the number of network layers, the number of nodes per layer, the number of epochs, and so on. These modifications are mainly made by trial and error, while visualization tools like Matplotlib’s plots can assist in achieving ideal outcomes. The Test Set is not permitted to be used in the training activity at the parameter or hyperparameter level. The proposed system uses a deep learning-based CNN architecture. Where the system improves the accuracy based on learning to improve the accuracy in terms of specificity and sensitivity.
Using the DenseNet network, we created an automated lung cancer X-ray image classification model. The denseNet algorithm is then used to analyze and categorize the lung cancer datasets. Our model delivers improved classification results in X-ray image categorization of cancers, with a test accuracy of 91.99 percent, according to experimental results. Our innovative approach of lung cancer image classification will aid radiologists’ treatment in the future, simplifying the steps of lung cancer diagnosis, improving the accuracy of lung cancer diagnosis, and lowering the rate. Furthermore, we will process the categorization of lung cancer using more high-quality lung cancer X-Ray scans, significantly boosting the network’s accuracy. TABLE I LUNG CANCER PREDICTION USING ACTIVATION FUNCTIONS Activation Function Accuracy (%) Loss Value (%) Tanh 36.9 41.6 Relu 37.5 36.08 Sigmoid 91.99 3.56 Softmax 85.9 12.89 We can further increase the accuracy by increasing the dataset or applying new methodologies in Deep Learning as Technology increasing there is lot of scope, every month new research as are coming forward to increase the Model performance. We can also develop app using the model to predict cancer or not like we can develop an Android or Web application, there is lot of scope in this Domain and Further can be improved.
 Silva, F., Pereira, T., Morgado, J., Frade, J., Mendes, J., Freitas, C., Negrao, E., De Lima, B.F., Da Silva, M.C., Madureira, A.J. and Ramos, I., 2021. EGFR assessment in lung cancer CT images: analysis of local and holistic regions of interest using deep unsupervised transfer learning. IEEE Access, 9, pp.58667-58676.  Das, S. and Majumder, S., 2020, November. Lung cancer detection using deep learning network: A comparative analysis. In 2020 Fifth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) (pp. 30-35). IEEE.  Loddenkemper, R., Gibson, G.J. and Sibille, Y., 2003. The burden of lung disease in Europe: why a European White Book on lung disease? European Respiratory Journal, 22(6), pp.869-69.  Gibson, G.J., Loddenkemper, R., Lundbäck, B. and Sibille, Y., 2013. Respiratory health and disease in Europe: the new European Lung White Book. European Respiratory Journal, 42(3), pp.559-563.  https://docs.python.org/3/library/tk.html  Paliwal, G. and Kurmi, U., 2021, December. A Comprehensive Analysis of Identifying Lung Cancer via Different Machine Learning Approach. In 2021 10th International Conference on System Modeling & Advancement in Research Trends (SMART) (pp. 691-696). IEEE.  Warren, G.W. and Cummings, K.M., 2013. Tobacco and lung cancer: risks, trends, and outcomes in patients with cancer. American Society of Clinical Oncology Educational Book, 33(1), pp.359-364.  https://www.tensorflow.org/api_docs  Wang, S., Chen, A., Yang, L., Cai, L., Xie, Y., Fujimoto, J., Gazdar, A. and Xiao, G., 2018. Comprehensive analysis of lung cancer pathology images to discover tumor shape and boundary features that predict survival outcome. Scientific reports, 8(1), p.10393.  Xu, Y., Hosny, A., Zeleznik, R., Parmar, C., Coroller, T., Franco, I., Mak, R.H. and Aerts, H.J., 2019. Deep learning predicts lung cancer treatment response from serial medical imaginglongitudinal deep learning to track treatment response. Clinical Cancer Research, 25(11), pp.3266-3275.  Zhang, Z., Zohren, S. and Roberts, S., 2019. Deeplob: Deep convolutional neural networks for limit order books. IEEE Transactions on Signal Processing, 67(11), pp.3001-3012.  Niepert, M., Ahmed, M. and Kutzkov, K., 2016, June. Learning convolutional neural networks for graphs. In International conference on machine learning (pp. 2014-2023). PMLR.  Lakshmanaprabu, S.K., Mohanty, S.N., Shankar, K., Arunkumar, N. and Ramirez, G., 2019. Optimal deep learning model for classification of lung cancer on CT images. Future Generation Computer Systems, 92, pp.374-382.  Zhao, H., Chu, H., Zhang, Y. and Jia, Y., 2020. Improvement of Ancient Shui character recognition model based on convolutional neural network. IEEE Access, 8, pp.33080-33087.  https://www.kaggle.com/datasets  Gysel, P., Pimentel, J., Motamedi, M. and Ghiasi, S., 2018. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE transactions on neural networks and learning systems, 29(11), pp.5784-5789.  Tang, S., Yuan, S. and Zhu, Y., 2020. Data preprocessing techniques in convolutional neural network based on fault diagnosis towards rotating machinery. IEEE Access, 8, pp.149487-149496.  Molchanov, P., Tyree, S., Karras, T., Aila, T. and Kautz, J., 2016. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440.  Atzori, M., Cognolato, M. and Müller, H., 2016. Deep learning with convolutional neural networks applied to electromyography data: A resource for the classification of movements for prosthetic hands. Frontiers in neurorobotics, 10, p.9.  Hu, X., Zeng, Y., Li, Z., Zheng, X., Cai, S. and Xiong, X., 2019. A resources-efficient configurable accelerator for deep convolutional neural networks. IEEE Access, 7, pp.72113-72124.  Raoof, S.S., Jabbar, M.A. and Fathima, S.A., 2020, March. Lung Cancer prediction using machine learning: A comprehensive approach. In 2020 2nd International conference on innovative mechanisms for industry applications (ICIMIA) (pp. 108-115). IEEE.  Wang, X., Chen, H., Gan, C., Lin, H., Dou, Q., Tsougenis, E., Huang, Q., Cai, M. and Heng, P.A., 2019. Weakly supervised deep learning for whole slide lung cancer image analysis. IEEE transactions on cybernetics, 50(9), pp.3950-3962.  Gerber, D.E., Gandhi, L. and Costa, D.B., 2014. Management and future directions in non-small cell lung cancer with known activating mutations. American Society of Clinical Oncology Educational Book, 34(1), pp.e353-e365.
Copyright © 2023 Akhil Kumar S, Sai Krishna Gangyada, Gifty Joyce Immadi, Venkateswara Reddy H. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET49552
Publish Date : 2023-03-14
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here