Authors: Neha Chaube, Fahad Mansuri, Kavya Sonkar, Shruti Singh
Certificate: View Certificate
: Technology has altered the health arena to a large extent in this era of IT. The goal of this research is to create a diagnosis model for a variety of diseases based on their symptoms. To create such a model, this system used data mining techniques such as classification. The intelligent agent is trained using datasets containing copious data regarding patient diseases that have been gathered, refined, categorised, and utilised. K-Fold cross-validation is used to evaluate the machine learning models after splitting the data. For cross-validation, employed are the Support Vector Classifier, Gaussian Naive Bayes Classifier, and Random The patient might then contact the doctor for further therapy based on the results. It\'s an example of how technology and medical expertise are flawlessly woven together with the goal of achieving \"prediction is better than cure.\"
Nowadays, the use of the internet has been stimulating curiosity among people and, be it of any kind, they are trying to find a solution to their problems through the internet only. It is a matter of fact that people have much easier access to the internet than hospitals and doctors. It's a fact nowadays before going to the doctor people tend to google their symptoms and try to figure out the diagnosis. Sometimes, when people don’t have time to visit the doctor they tend to self diagnose which could be dangerous. The proposed system here provides a better and more effective alternative to randomly googling your systems and harming oneself by simply registering on a network, picking out the symptoms, and then getting the prognosis along with the details of doctors they could contact those are specialized in that field. This Disease Prediction system is a web-based application that predicts the most probable disease of the user in accordance with the given symptoms with the help of the data sets collected from different health-related sites. It often happens that someone nearer or dearer to you may need a doctor’s help immediately for some serious reasons but the doctor isn’t available for consultation for some prior commitments or other obvious reasons. That is when the role of this automated program comes into play. This Disease Prediction system can be used for urgent guidance on their illness according to the details and symptoms they will feed to the web-based application. Here, some intelligent data processing techniques are used to get the most accurate disease that would be related to the patient’s details. And then based on the results, the patient can contact the respective disease specialist for any further treatments. This system can be used for a free consultation regarding any illness. Also, it cuts the cost of visiting a general physician first. The patient registered on the network can get their prognosis and can get direct consultation from a doctor specialized in that particular field.
II. LITERATURE SURVEY
Currently available detection systems are mostly inaccurate and also does not have option for multiple diseases detection.
The diseases are predicted automatically in the proposed system using a model, which has been trained on a medical dataset. This technique also displays the prediction's confidence score. Following the diagnosis of the anticipated ailment, the system will recommend specialists who specialise in that disease, allowing the patient to consult with them online. The suggested technology functions as a decision support system and will health practitioners in making diagnoses.
. When the user visits the application they are given two choice
A. To register as a patient
B. To register as a doctor on the network
After the user has successfully logged in to the web application
a. The user is directed to their profile page where they can see their patient ID, name, and email along with the option to edit their information.
b. Alongside that, they have three options- to check the disease, view consultation history, and feedback for the network.
c. When they click on select the disease option to get a prognosis, they are redirected to a page where they can add whatever symptoms they are having from a drop-down consisting of 132 symptoms listed. Whatever symptoms they chose get added to the system list after which they can click on predict to get their prognosis.
d. Depending upon the number of symptoms provided the system provides a prognosis along with the confidence score. The confidence score here implies to the percentage of the model is sure about the prognosis.
e. Not only that, the application provides a link that will direct the user to get a better understanding of the predicted disease.
f. And the most important feature of the proposed model is that along with the prognosis the system gives the opportunity to connect with the doctor specialized in that particular field to the user who is registered on the network along with their contact details.
g. Patients can access a list of doctors who specialize in their condition and receive ratings as well as the ability to chat with them online.
2. As a doctor
a. When the doctor logs in to the application they are directed to their profile page where they can view the consultation history and give feedback.
b. Consultation history consists of all the consultations the doctor has given on the network be it active or closed.
c. When there is a consultation request from a patient, the status is shown as active and the doctor can consult the patient on the network, can the patient’s profile and according to them after the procedure mark it as close.
IV. TRAINING OF THE MODEL
A. Data collection
The symptoms of this disease have been found on the internet, so that identification it is more accurate i.e. no dummy values are entered. The dataset is collected from kaggle. The CSV file contains 5000 rows and 133 columns, 132 columns for the unique symptoms. And the last column for the disease class (40 unique disease classes).
B. Cleaning the Data
The most crucial phase in a machine learning project is cleaning. The machine learning model's quality is determined on the quality of the data. As a result, data must be cleaned before being fed to the model for training. All of the columns in the dataset are numerical, except for the goal column, prognosis, which is a textual type that is encoded to numerical form using a label encoder.
C. Model Building
After the data has been gathered and cleaned. The model is trained using clean data. The Support Vector Classifier, Naive Bayes Classifier, and Random Forest Classifier were all trained using cleaned data. We've also plotted a confusion matrix at the end to assess the models' quality. By merging the predictions of all three models after training them, predict of the disease for the input symptoms is made possible. This strengthens and improves the accuracy of the total prediction.
D. Dataset splitting
When training a machine learning model, dataset is separated into two:
a) The training dataset
b) The testing dataset.
Data is divided into an 80:20 structure, which means that 80% of the information is utilised to train the model and 20% is used to evaluate the model's performance.
K-Fold cross-validation is utilised to evaluate the machine learning models after splitting the data. For cross-validation, Support Vector Classifier is employed, Gaussian Naive Bayes Classifier, and Random Forest Classifier.
In order to build a comprehensive model two different factors are combined. Taking into account the predictions of all three models, the final prediction would be the correct one. This approach helps us to keep the predictions much more accurate on completely unseen data.
After training all the three models on the train data, quality of the models is checked using a confusion matrix, and then combined the predictions of all the three models.
After combing all the three models, test of combined model on the test data began. In result, combined model has classified all the data points accurately.
As a function is created that takes symptoms separated by commas as input and outputs the predicted disease using the combined model based on the input symptoms.
If the patient is logged in, they will be able to access disease prediction. This ensures seamless one click solution to get an accurate prediction.
The result is shown to the patient with not only a high accuracy but also a confidence score. Which ensures that not one symptom is affiliated to one disease. Symptoms can be common between different diseases, hence a confidence score shows the chance of contracting that particular disease.
VI. FUTURE SCOPE
A. A prime account option available for the patients.
B. Video calling feature.
C. The website's account linking feature allows users to connect their account with other online services like Gmail and social media.
D. Addition of a map feature to the website, like adding an API for it.
E. Partner with a pharmacy and provide discounts on the medicine for the patients.
Proposed a system to predict the disease based on previous cases in the medical history and connected the patients registered on the network with the best doctors in the specialized field by reducing a patient’s trouble visiting a general physician before. A disease prediction web application network based on a machine learning algorithm was effectively built. Support Vector Classifier, Naive Bayes Classifier, and Random Forest Classifier were used to train three different models, which were then combined to create a more accurate and effective system to classify patient data. This is because medical data is growing at an exponential rate, and it is necessary to process existing data in order to predict exact disease based on symptoms. By providing the input as patient symptoms, we were able to get an accurate general illness risk prediction, which let us grasp the level of disease risk prediction.
 Ba-Alwi, F.M. and Hintaya, H.M. (2013) Comparative Study for Analysis the Prognostic in Hepatitis Data: Data Mining Approach. International Journal of Scientific & Engineering Research  Fatima Ibrahim, Mohd Nasir Taib, Wan Abu Bakar Wan Abas, Chan Chong Guan, Sadiah Sulaiman (2005) A Novel Dengue Fever (DF) and Dengue Haemorrhagic Fever (DHF) Analysis Using Artificial Neural Network (ANN). Computer Methods and Programs in Biomedicine.  A. Ansari and N. K. Gupta, “Automated diagnosis of coronary heart disease using neuron-fuzzy integrated system,” in 2011 World Congress on Information and Communication Technologies. IEEE, 2011.  A. Javeed, S. Zhou, L. Yongjian, I. Qasim, A. Noor, and R. Nour, “An Intelligent Learning System Based on Random Search Algorithm and Optimized Random Forest Model for Improved Heart Disease Detection,” IEEE Acess, vol.7,pp. 20313-20324, 2020.  M. Gjoreski, A. Gradisek, B. Budna, M. Gams, and G. Poglajen, “Machine Learning and End-to-End Deep Learning for the Detection of Chronic Heart Failure from Heart Sounds,” IEEE Access, vol. 8, pp. 20313–20324, 2020,  L. Ali, A. Rahman, A. Khan, M. Zhou, A. Javeed, and J. A. Khan, “An Automated Diagnostic System for Heart Disease Prediction Based on ?2 Statistical Model and Opt imally Configured Deep Neural Network,” IEEE Access, vol. 7, pp. 34938–34945, 2019  M. R. Ahmed, S. M. Hasan Mahmud, M. A. Hossin, H. Jahan and S. R. Haider Noori, “A cloud based architecture for early detection of heart disease with machine learning algorithms,” 2018 IEEE 4th International Conference on Computational Creativity. ICCC 2018, pp. 1951–1955, 2018  A. K. M Sazzadur Rahman, M. Mehedi Hasan, S. Asaduzzaman, M. Asaduzzaman, and S. Akhter Hossain, “An analysis of computational intelligence techniques for diabetes prediction Machine Learning View project An analysis of computational intelligence techniques for diabetes prediction,” Int. J. Eng. &Technology, vol. 7, no. 4, pp. 6229–6232, 2018.  G. H. Tang, A. B. M. Rabie, and U. Hägg, “Indian hedgehog: A Mechanotransduction Mediator in Condylar Cartilage,” J. Dent. Res., vol. 83, no. 5, pp. 434–438, 2004  Y. Karaca and C. Cattani, “7. Naive Bayesian classifier,” Computer Methods Data Analysis  Purushottam, K. Saxena, and R. Sharma, “Efficient Heart Disease Prediction System,” Procedia Computer Science, vol. 85, pp. 962–969, 2016  K. Deepika and S. Seema, “Predictive analytics to prevent and control chronic diseases,” Proc. 2016 2nd Int. Conf. Appl. Theor. Comput. Commun. Technology iCATccT 2016, no. January 2016, pp. 381386, 2017  “Analysis and Prediction of Various Heart Diseases Using DNFS Techniques,” vol. 2, no. 1, pp. 1–7, 2015. Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020) IEEE Xplore Part NumberCFP20V66-ART; ISBN: 978-1-7281-4108-4978
Copyright © 2022 Neha Chaube, Fahad Mansuri, Kavya Sonkar, Shruti Singh. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.