In the field of ML-Machine Learning, classification is one of the most widely used prediction tasks. In recent era, ML is being widely deployed in almost every field of real-world applications including heathcare. When we use ML for healthcare applications, it should be our main goal to achieve highest possible accuracy. Accuracy of any model is dependent on training dataset and algorithm being implemented. Different characteristics of training dataset contribute significantly to achieve highest possible accuracy. If we talk about general observations then the healthcare applications related data are mainly numerical like test reports showing numerical values. Classification is a categorical task that is easy to understand by patients like whether someone is having a particular disease or not. In this research work, we have evaluated and compared performances of various classifiers to decide which classifier works best when the training data is exclusively nume?rical. Based on our experiments, we have observed that Logistic Regression, Neural Network and Naive Bayes perform more accurately for exclusively numerical data to predict diabetes.
Introduction
1. Introduction to NLP & Sentiment Analysis
Natural Language Processing (NLP) is a branch of computer science that enables computers to understand and process human language.
Sentiment Analysis, a key NLP application, identifies the emotional tone behind texts (e.g., reviews, social media posts).
It helps in various domains:
Business: Analyzing customer feedback.
Social Media & Safety: Detecting mental health or safety risks.
Sentiments are generally classified as positive, negative, or neutral.
2. Sentiment Analysis Approaches
Two main techniques:
Dictionary (Lexicon) Based: Predefined word lists categorized by sentiment. Fast and easy but limited in handling context (e.g., sarcasm).
Pre-trained Model Based: Use machine learning models trained on large datasets. Slower but better at capturing context.
3. Research Focus
This study compares the performance of three lexicon-based methods:
VADER
SentiArt
Liu-Hu
4. Literature Review
Recent studies apply sentiment analysis using:
Machine Learning (ML) & Deep Learning (DL):
CNN, LSTM models show high accuracy but are complex and resource-heavy.
Lexicon-Based Methods:
Simpler and domain-independent.
Examples: VADER used for vaccine sentiment, SentiArt for election opinion, Liu-Hu for analyzing news events.
VADER showed the highest accuracy; Liu-Hu performed poorly.
Disgust
VADER
Similar pattern: VADER > SentiArt > Liu-Hu.
Fear
VADER
Overall accuracy lower than for Anger/Disgust.
Joy
SentiArt (slightly)
All methods performed poorly for Joy.
Neutral
Liu-Hu
Only Liu-Hu performed well here.
Sadness
VADER
Consistent strong performance by VADER.
Shame
VADER
SentiArt did well, Liu-Hu moderately.
Surprise
SentiArt (slightly)
All models struggled; SentiArt marginally better.
Conclusion
Our research work was primary focused on effective sentiment analysis from short text. We have used a large and diverse dataset of 34791 text samples that has records for 7 different sentiments. Our implementation was done with Orange tool to analyse performances of three widely used sentiment analysis approaches: VADER, SentiArt and Liu-Hu. Based on our analysis, we have observed that VADER performs best to identify negative sentiments such as anger, disgust, fear, sadness and shame. All three approaches perform with similar accuracies for positive sentiments such as joy and surprise. Only Liu-Hu could accurately identify Neutral sentiment. The main reasons might be the selection of dataset and threshold values of compound scores to classify sentiments. Further to this research work, a more complex and diverse dataset can be tested with different levels of compound scores to improve overall accuracy of sentiment analysis.
References
[1] Jim, Jamin Rahman, et al. \"Recent advancements and challenges of NLP-based sentiment analysis: A state-of-the-art review.\" Natural Language Processing Journal (2024): 100059.
[2] Aftab, Farhan, et al. \"A comprehensive survey on sentiment analysis techniques.\" International Journal of Technology 14.6 (2023): 1288-1298.
[3] Kastrati, Zenun, et al. \"Sentiment analysis of students’ feedback with NLP and deep learning: A systematic mapping study.\" Applied Sciences 11.9 (2021): 3986.
[4] Babu, Nirmal Varghese, and E. Grace Mary Kanaga. \"Sentiment analysis in social media data for depression detection using artificial intelligence: a review.\" SN computer science 3.1 (2022): 74.
[5] Wankhade, Mayur, Annavarapu Chandra Sekhara Rao, and Chaitanya Kulkarni. \"A survey on sentiment analysis methods, applications, and challenges.\" Artificial Intelligence Review 55.7 (2022): 5731-5780.
[6] Ahmed, Alim Al Ayub, et al. \"Business boosting through sentiment analysis using Artificial Intelligence approach.\" International Journal of System Assurance Engineering and Management 13.Suppl 1 (2022): 699-709.
[7] Villavicencio, Charlyn, et al. \"Twitter sentiment analysis towards covid-19 vaccines in the Philippines using naïve bayes.\" Information 12.5 (2021): 204.
[8] Marlina, Dewi, et al. \"Sentiment Analysis on Natural Skincare Products.\" Journal of Data Science 2022.12 (2022): 1-17.
[9] Gandhi, Usha Devi, et al. \"Sentiment analysis on twitter data by using convolutional neural network (CNN) and long short term memory (LSTM).\" Wireless Personal Communications (2021): 1-10.
[10] Rice, Douglas R., and Christopher Zorn. \"Corpus-based dictionaries for sentiment analysis of specialized vocabularies.\" Political Science Research and Methods 9.1 (2021): 20-35.
[11] Grljevi?, Olivera, Zita Bošnjak, and Aleksandar Kova?evi?. \"Opinion mining in higher education: a corpus-based approach.\" Enterprise Information Systems 16.5 (2022): 1773542.
[12] Al-Garaady, Jeehaan, and Mohammad Mahyoob. \"Public sentiment analysis in social media on the SARS-CoV-2 vaccination using VADER lexicon polarity.\" Humanities and Educational Sciences Journal (2022): 591-609.
[13] Jacobs, Arthur M., and Annette Kinder. \"Electoral Programs of German Parties 2021: A Computational Analysis Of Their Comprehensibility and Likeability Based On SentiArt.\" arXiv preprint arXiv:2109.12500 (2021).
[14] [Scotland, Jesse, Alvin Thomas, and Mengguo Jing. \"Public emotion and response immediately following the death of George Floyd: A sentiment analysis of social media comments.\" Telematics and Informatics Reports 14 (2024): 100143.
[15] Youvan, Douglas C. \"Understanding sentiment analysis with VADER: a comprehensive overview and application.\" AI and Data Science Journal (2024).
[16] Sherstinova, Tatiana, et al. \"Sentiment Analysis of Literary Texts vs. Reader\'s Emotional Responses.\" 2023 33rd Conference of Open Innovations Association (FRUCT). IEEE, 2023.
[17] Hu, M.; Liu, B. Mining opinion features in customer reviews. In Proceedings of the 19th national conference on Artificial Intelligence (AAAI’04), San Jose, CA, USA, 25–29 July 2004; pp. 755–760.
Sentiment dataset https://www.kaggle.com/code/nileshely/sentiment-annotated-text-corpus/input?select=emotion_dataset_raw.csv