Disease Prediction using Machine Learning

Authors: Magilan G, Swaitha K, Sarath Vijay R, Jona J. B

DOI Link: https://doi.org/10.22214/ijraset.2022.41230

Abstract

It is a system which provides the user the information and tricks to take care of the health system of the user and it provides how to search out the disease using this prediction. Now a day’s health industry plays major role in curing the diseases of the patients so this is often also some quite help for the health industry to inform the user and also it\'s useful for the user just in case he/she doesn’t want to travel to the hospital or the other clinics, so just by entering the symptoms and every one other useful information the user can get to grasp the disease he/she is affected by and also the health industry may also get enjoy this method by just asking the symptoms from the stoner and entering within the system and in only many seconds they\'ll tell the precise and over to some extent the accurate conditions. This Disease Prediction Using Machine Learning is totally through with the assistance of Machine Learning and Python programming language and also using the dataset that\'s available previously by the hospitals using that we are going to predict the diseases.

Introduction

I. INTRODUCTION

The purpose of constructing this project called “Disease Prediction Using Machine Learning” is to predict the accurate disease of the patient using all their general information’s and also the symptoms. If this Prediction is completed at the first stages of the disease with the assistance of this project and every one other necessary measure disease is cured and generally this prediction system can even be very useful in health industry. The final purpose of this Disease prediction is to supply prediction for the assorted and customarily occurring diseases that when unchecked and sometimes ignored can turns into fatal disease and cause lot of problem to the patient and moreover as their members of the family. this method will predict the foremost possible disease supported the symptoms. The health industry in information yet and knowledge poor and this industry is incredibly vast industry which has lot of labor to be done. So, with the assistance of all those algorithms, techniques and methodologies we've done this project which is able to help the peoples who are within the need.

II. LITERATURE REVIEW

Diabetes is one of lethal diseases in the world. It is additional an inventor of various varieties of disorders for example: coronary failure, blindness, urinary organ diseases etc. In such a case the patient is required to visit a diagnostic centre, to get their reports after consultation. Due to every time they must invest their time and currency. But with the growth of Machine Learning methods we have got the flexibility to search out an answer to the current issue, we have got advanced system mistreatment information processing that has the ability to forecast whether the patient has polygenic illness or not. Furthermore, forecasting the sickness initially ends up in providing the patients before it begins vital. Information withdrawal has from a large quantity of diabetes associated information. The aim of this analysis is to develop a system which might predict the diabetic risk level of a patient with a better accuracy. Model development is based on categorization methods as Decision Tree, ANN, I Bayes and SVM algorithms. For Decision Tree, the models give precisions of 85%, for I Bayes 77% and 77.3% for Support Vector Machine. Outcomes show a significant accuracy of the methods.
The advances in data technology have witnessed nice progress on aid technologies in varied domains these days. However, these new technologies have conjointly created aid information not solely abundant larger however conjointly way more tough to handle and method. Moreover, as a result of the info created from a range of devices among a brief time span, the characteristics of those information that they're hold on in numerous formats and created quickly, which can, to an oversized extent, be considered an enormous information downside. To provide a a lot of convenient service and setting of aid, this paper proposes a cyber-physical system for case centric aid operations and service, appertained to as Health-CPS, finagled on pall and large information analytics the nodes of a DT tree technologies. this technique consists of a knowledge assortment layer with a unified commonplace, a knowledge management layer for distributed storage and parallel computing, and a data-oriented service layer. The results of this study show that the technologies of cloud and large information is accustomed enhance the performance of the aid system so humans will then get pleasure from varied good aid applications and service.
SVM is very good when we have no idea on the data. Even with unstructured and semi structured data like text, images, and trees SVM algorithm works well. The drawback of the SVM algorithm is that to achieve the best classification results for any given problem, several key parameters are needed to be set correctly. Decision tree: It is easy to understand and rule decision tree. Instability is there in decision tree, that is bulky change can be seen by minor modification in the data structure of the optimal decision tree. They are often relatively inaccurate. I Bayes: It is robust, handles the missing values by ignoring probability estimation calculation. Sensitive to how inputs are prepared. Prone bias when increase the number of training dataset. ANN: Gives good prediction and easy to implement. Difficult with dealing with big data with complex models. Require huge processing time.
Diabetes is caused due to the excessive amount of sugar condensed into the blood. Currently, it is considered as one of the lethal diseases in the world. People all around the globe are affected by this severe disease knowingly or unknowingly. Other diseases like heart attack, paralyzed, kidney disease, blindness etc. are also caused by diabetes. Numerous computer-based detection systems were designed and outlined for anticipating and analysing diabetes. Usual identifying process for diabetic patients needs more time and money. But with the rise of machine learning, we have that ability to develop a solution to this intense issue. Therefore, we have developed an architecture which has the capability to predict where the patient has diabetes or not. Our main aim of this exploration is to build a web application based on the higher prediction accuracy of some powerful machine learning algorithm. We have used a benchmark dataset namely Pima Indian which can predict the onset of diabetes based on diagnostics manner. With an accuracy of 82.35% prediction rate Artificial Neural Network (ANN) shows a significant improvement of accuracy which drives us to develop an Interactive Web Application for Diabetes Prediction.

III. ALGORITHM

A. Decision Tree Algorithm

Decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy), each representing values for the attribute tested. Leaf node (e.g., Hours Played) represents a decision on the numerical target. The topmost decision node in a tree which corresponds to the best predictor called root node. Decision trees can handle both categorical and numerical data. Depending on the take a look at outcome, the classification algorithmic rule branches towards the suitable kid node wherever the method of take a look at and branching repeats till it reaches the leaf node . The leaf or terminal nodes correspond to the choice outcomes. DTs are found straightforward to interpret and fast to be told, and area unit a standard element to several medical diagnostic protocols [25]. once traversing the tree for the classification of a sample, the outcomes of all tests at every node on the trail can offer spare data to conjecture concerning its categories. associate degree illustration of associate degree DT with its components and rules is portrayed.

Random Forest is a supervised learning algorithm. It is an extension of machine learning classifiers which include the bagging to improve the performance of Decision Tree. It combines tree predictors, and trees are dependent on a random vector which is independently sampled. The distribution of all trees are the same. Random Forests splits nodes using the best among of a predictor subset that are randomly chosen from the node itself, instead of splitting nodes based on the variables. The time complexity of the worst case of learning with Random Forests is O(M(dnlogn)) , where M is the number of growing trees, n is the number of instances, and d is the data dimension. It can be used both for classification and regression. It is also the most flexible and easy to use algorithm. A forest consists of trees. It is said that the more trees it has, the more robust a forest is. Random Forests create Decision Trees on randomly selected data samples, get predictions from each tree and select the best solution by means of voting. It also provides a pretty good indicator of the feature importance.

B. Navie Bayes

Naive Bayes is a set of supervised learning algorithms based on the Bayes’ theorem with the “naïve” assumption of independence between every pair of features. Despite its simplicity, it often outperforms more sophisticated classification methods. If there are input variables x and output variable y, Bayes’ theorem states the following relationship. p(y|x) = p(y).p(x|y)/ p(x) In this project, Gaussian Naïve Bayes algorithm has been implemented. In case of Gaussian Naïve Bayes, the likelihood of the features us assumed to be Gaussian i.e. all continuous values x associated with class y are distributed according to Gaussian distribution. Given a continuous attribute x in training data, the data is first segmented by the class y. Then, the mean and variance of x in each class is computed.

If μ be the mean of the values in x associated with class y, then let d2 be the variance of the values in x associated with class y. Suppose there is some observation value v then, the probability distribution of v given by class y, p(x=v | y), can be computed by plugging into the equation for a normal distribution thought-about during this figure. Thus, the chance of ‘white’ given ‘green’ is zero.025 (1 ÷ 40) and therefore the chance of ‘white’ given ‘red’ is zero.15 (3 ÷ 20). though the previous chance indicates that the new ‘white’ object is a lot of probably to Retain ‘ green’ class, the chance shows that it's a lot of presumably to be within the‘ red’ categories. within the theorem analysis, the ultimate classifier is created by combining each sources of knowledge (i.e., previous chance and chance value). The ‘multiplication’ perform is employed to mix these 2 sorts of data and therefore the product is termed the ‘posterior’ chance. Finally, the posterior chance of ‘white’ being ‘green’ is zero.017 (0.67 × 0.025) and therefore the posterior chance of ‘white’ being ‘red’ is zero.049 (0.33 × 0.15). Thus, the new ‘white’ object ought to be category as a member of the ‘red’ class per the NB technique.

IV. IMPLEMENTATION

The project malady Prediction mistreatment Machine Learning is developed to beat general malady in earlier stages as we tend to all recognize in competitive surroundings of economic development the human race has concerned thus much that he/she isn't involved regarding health per analysis there area unit four-hundredth peoples however Ignores regarding general malady that ends up in harmful malady later. Even the interface of this project is completed mistreatment python's library interface referred to as Tkinter. Here 1st the user must register into the system so as to use the prediction, user must register with username, email-id, phone, agenda parole. of these values area unit keep into the filing system severally, then user has choice to move forward or leave, then user must login to the system mistreatment the username and parole that he/she provided throughout the time of registration. If he/she enter incorrect username and proper parole then the error message can prompt stating incorrect username and he/she enters incorrect parole and proper when work within the user must the name and desires to pick out the symptoms from given change posture menu, for additional correct result the user must enter all the given symptoms, then the system can give the correct result. This prediction is essentially through with the assistance of three algorithms of machine learning like call Tree, Random Forest and Naïve mathematician. once user enter all the symptoms then he must press the buttons of various rule, for instance there area unit three buttons for three algorithms, if user enters all symptoms and presses solely Random Forest button then the result are going to be provided solely shrewd mistreatment that rule, like this we've got used three algorithms to produce additional clear image of the results and user must be happy along with his expected result.

V. RESULT

The result for this prediction system displays a convenient user interface consisting of details like name, symptoms and the algorithm that we use to predict as a button and the results will be predicted based on the implemented algorithm.

It also displays the accuracy percentage on which algorithm has the best accuracy so based on the accuracy of the decision tree, random forest and naive bayes algorithm random forest has the better accuracy percentage of 0.96. It is a best suited algorithm for this model.

Conclusion

The Prediction Engine that allows the user to examine whether or not he/she has any unwellness or disorder supported the given symptoms. The user interacts with the Prediction Engine by filling a collection of symptoms that holds the parameter set provided as associate input to the trained models. The Prediction Engine makes use of 3 algorithms to predict the presence of a unwellness namely: call Tree, Random Forest and Naive Bayes. The reason to settle on these 3 algorithms are: 1) They effective, if the coaching information is massive. 2) A single dataset is provided as associate input to any or all these three algorithms with bottom or no modification. 3) A common scalar is accustomed normalize the input provided to those three algorithms.

References

[1] Kaveeshwar, S.A., and Cornwall, J., 2014, “The current state of unwellness mellitus in India”. AMJ, 7(1), pp. 45-48 [2] Dean, L., McEntyre, J., 2004, “The Genetic Landscape of unwellness [Internet]. Bethesda (MD): National Center for Biotechnology info (US); Chapter one, Introduction to unwellness. 2004 Jul 7. [3] Y. Zhang, M. Qiu, C.-W. Tsai, M. M. Hassan and A. Alamri, \"HealthCPS: aid cyberphysical system power-assisted by cloud and massive data\", IEEE Syst. J, vol. 11, no. 1, pp. 88-95, Mar. 2017. [4] Allen Daniel Sunny, Sajal Kulshreshtha, Satyam Singh, Srinabh, Mohan Ba and H Sarojadevi, \"Disease identification System By Exploring Machine Learning Algorithms\", International Journal of Innovations in Engineering and Technology (IJIET), vol. 10, no. 2, May

Copyright

Copyright © 2022 Magilan G, Swaitha K, Sarath Vijay R, Jona J. B. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET41230

Publish Date : 2022-04-05

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here