Classifying User Reviews of Movie Applications using Improved Logistic Regression

Authors: Yogesh Ramaswamy

DOI Link: https://doi.org/10.22214/ijraset.2025.68593

Abstract

In recent years review classification, analysis and prediction are one of the most commonapplications of sentiment analysis. It involves detection of sentiments on the reviews made bythe users on social networking applications through opinion mining.In general,reviews canhave positive, negative or neutral polarity indicators. For classification, the polarity indicatorstake the form of certain words and emotions that readily show the user’s sentiments. Existingworks fall short of producing accurate classification results because of two-class problem thataffects the performance of evaluation parameters like precision, recall, accuracy and F-measure.Hencethereisaneedofanefficientclassificationtechniquewhichaddressestwo-classproblem. Thiswork proposes ImprovedversionofLogisticRegression[ILR]thatiscommonly used for sentiment analysis and classification. The proposed classification techniqueidentifies and replaces the misspelled words in the sentence,supportcountestimation andclassificationofreviewsalongwithmultipleindependentwordswithsimilarmeaninginparallel. The experimental results show the classification accuracy of the proposed technique tobemoreaccuratecomparedtothe existinglogistic regressionandnaïvebayesclassifiers.

Introduction

Overview:

The research focuses on sentiment analysis—a branch of opinion mining within web and data mining. It processes large-scale textual reviews (like movie reviews) to identify user opinions (positive, negative, or neutral) using machine learning and natural language processing (NLP) techniques.

Core Concepts:

Data Mining: Extracts patterns from large datasets.
Web Mining: Applies data mining to web data.
Opinion Mining / Sentiment Analysis: Determines emotional tone behind textual content, often used to analyze reviews.

Existing Approaches:

Lexicon-Based Methods: Use sentiment dictionaries and corpora.
Machine Learning Models: Algorithms like Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), and Neural Networks (NN).
Hybrid Methods: Combine lexicon and ML techniques for improved accuracy.

Common Challenges in Existing Techniques:

Limited to unigram features (single words).
Fail with multiple variables of similar meaning.
Don’t handle misspelled words well.
Low performance metrics (accuracy, precision, recall, F-measure).

Proposed Solution: ILR (Improved Logistic Regression)

To address these issues, the paper introduces ILR, a modified logistic regression model that:

? Key Innovations:

Identifies and corrects misspelled words using POS (Part of Speech) tagging.
Supports feature correlation based on occurrence and context.
Handles multi-word features (adverbs, adjectives, verbs) instead of unigrams only.

ILR Workflow & Methodology:

1. Dataset:

IMDB dataset of 50,000 movie reviews (25K training, 25K test).
Balanced and pre-labeled for binary sentiment classification (positive or negative).

2. Stages in ILR:

a) Data Preprocessing:

Tokenization: Break text into words/tokens.
Stop-word removal: Eliminate common words.
Stemming: Reduce words to their base/root form.
POS Tagging: Helps identify word roles and correct spelling errors.

b) Feature Extraction:

Extracts key sentiment-carrying parts of speech:
- Verbs (V), Adjectives (AJ), Adverbs (A)
- Combinations like AV (Adverb + Verb), AAJ (Adverb + Adjective), etc.
Uses WordNet for semantic analysis and categorization.

c) Classification with ILR:

Combines joint distribution and input-output mapping.
Matches words not just by form but also by semantic similarity.
Enhances accuracy in binary classification by resolving ambiguity and misspellings.

Case Study:

A real-world application is considered—a web-based movie ticket booking system:

Before booking, users can analyze sentiment of movie reviews.
Helps in decision-making by checking others’ experiences across platforms.

Comparative Literature Review:

A survey of 7 major research papers highlights:

Most methods use Naive Bayes or SVM with unigram-based features.
Accuracy varies (65–90%) but lacks adaptability to complex sentence structures or semantic variations.
ILR aims to overcome these gaps with context-aware classification.

Key Contributions of This Research:

Corrects misspelled words using POS tagging.
Extracts relevant features using enhanced feature selection techniques.
Applies ILR classifier to manage multi-variable sentiment classification effectively.
Offers higher accuracy and better precision-recall balance than traditional LR/NB classifiers.

Conclusion

The analysis and classification of various movie based reviews is taken from different moviebased applications.Differentclassifiersare used toclassify the reviewson the movies likeNaive bayes, Logistic Regression, Support Vector Machine etc., The existing classifiers fails inachieving the desired accuracy, because the classifiers does not work properly with multipleindependent variables i.e. word with similar meaning is treated as separate for the classificationthat affects the performance parameters. While classification, the proposed work addressed thetwo-class problem which is the main drawback in the existing LR classifier.With the proposedclassifier achieved an average classification accuracy of 88% by varying the size of the reviews.The proposed classifier accuracy has been evaluated with different evaluation parameters andachieved better performance.In future, this work can be extended on mining the reviews frommultiple applications such as Bookmyshow, Paytm etc. Further improved machine learningalgorithms can be incorporated to improve the efficiency, which will help in deciding the bestclassificationclassifierinsentimentalanalysis.

References

[1] Farkhund Iqbal, JahanzebMaqbool,Benjamin C. M. Fung,RabiaBatool,Asad Masood Khattak,SaiqaAleem, Patrick C. K. Hunga, “A Hybrid Framework for Sentiment Analysis Using GeneticAlgorithmBasedFeatureReduction”,IEEE,vol.7,pp.14637-14652,2019. [2] Tu Nguyen Thi Ngoc, Ha Nguyen Thi Thu, Viet Anh Nguyen, “Mining aspects of customer’s reviewonthesocialnetwork”,JournalofBigData, vol. 6, Springer, Number 1, pp 6-22.Articlenumber: 22,2019 [3] K. L. S. Kumar, J. Desai and J. Majumdar, \"Opinion mining and sentiment analysis ononlinecustomer review,\" IEEE International Conference on Computational Intelligence and ComputingResearch(ICCIC),pp. 1-4, 2016 [4] Sari Widya Sihwi, InsanPrasetyaJati, RiniAnggrainingsih, “Twitter Sentiment Analysis of MovieReviews Using Information Gain and Naïve Bayes Classifier”, IEEE International Conference onApplication forTechnologyofInformationandCommunication(iSemantic),pp.190-195,2018 [5] MariumNafees,HafsaDar,IkramUllahLali,Salman Tiwana,“Sentiment Analysisof Polarity inProductReviewsInSocialMedia”, 14thInternationalConferenceonEmergingTechnologies(ICET), pp. 1-6, 2018 [6] N. Banik and M. Hasan Hafizur Rahman, \"Evaluation of Naïve Bayes and Support Vector Machineson Bangla Textual Movie Reviews,\" International Conference on Bangla Speech and LanguageProcessing(ICBSLP),Sylhet, pp. 1-6,2018 [7] PeimanBarnaghi, John G. Breslin, ParsaGhaffari, “Opinion Mining and Sentiment Polarity on Twitterand Correlation Between Events and Sentiment”, Oxford, Second International Conference on BigDataComputingServiceandApplications, pp. 52-57,2016. [8] Wang,Yequan,AixinSun,JialongHan,YingLiu,andXiaoyanZhu.\"Sentimentanalysisbycapsules.\"InProceedings ofthe2018worldwidewebconference,pp.1165-1174.2018 [9] Chantal Fry, Sukanya Manna, “Can we Group Similar Amazon Reviews: A Case Study with DifferentClusteringAlgorithms”, TenthInternationalConferenceonSemantic Computing,pp.374-377,2016. [10] Asha S Manek, P Deepa Shenoy, M Chandra Mohan, Venugopal K R, “Aspect term extraction forsentiment analysis in large movie reviews using Gini Index feature selection method and SVMclassifier”,WorldWideWeb,vol.20,Springer, Number2, pp.135-154, 2017 [11] Haiyun Peng, Erik Cambria, Amir Hussain, “A Review of Sentiment Analysis Research in ChineseLanguage”,CognitiveComputation,vol.9, Springer,Number4,pp.423-435, 2017 [12] J. Zheng and L. Zheng, \"A Dictionary-Based Convolution Recurrent Neural Network Model forSentiment Analysis\", 2019 International Conference on Communications, Information System andComputerEngineering(CISCE),Haikou, China,pp. 606-611,2019 [13] N. Mtetwa,A.O. Awukam andM.Yousefi,\"Feature ExtractionandClassificationof MovieReviews,\"5thInternational ConferenceonSoftComputing &MachineIntelligence(ISCMI), Nairobi, Kenya, pp.67-71,2018 [14] S. Rajalakshmi, S.Asha, N.Pazhaniraja, “A Comprehensive Survey on Sentiment Analysis”, 4thInternational Conference on Signal Processing, Communications and Networking (ICSCN -2017),pp.1-5,2017. [15] Harpreet Kaur, VeenuMangat, Nidhi, “A survey of sentiment analysis techniques”, InternationalConference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), , pp. 921-925,2017. [16] Vikas K Vijayan,K. R. Bindu,LathaParameswaran, “A comprehensive study of text classificationalgorithms” ,IEEEInternationalConferenceonAdvancesinComputing,CommunicationsandInformatics (ICACCI), , pp. 1109-1113,2017. [17] X. Lei, X. Qian and G. Zhao, \"Rating Prediction Based on Social Sentiment From Textual Reviews,\"inIEEETransactions onMultimedia,vol.18,Number9, pp.1910-1921, Sept.2016. [18] Parkhe V. & Biswas B. “Sentiment analysis of movie reviews: finding most important movie aspectsusingdrivingfacto rs”,SoftComputing,vol.20,Springer,pp.3373-3379, 2016. [19] KetanSarvakar, Urvashi K Kuchara, “Sentiment Analysis of movie reviews: A new feature-basedsentiment classification”, International Journal of Scientific Researchin ComputerScience andEngineering,vol.6, Issue.3,pp.8-12, 2018. [20] DoaaMohey El-Din Mohamed Hussein, “A survey on sentiment analysis challenges”, Journal ofKingSaudUniversity–EngineeringSciences,vol.30,Elsevier, pp330–338, 2018 [21] WalaaMedhat, Ahmed Hassan, HodaKorashy,“Sentiment analysis algorithms and applications: Asurvey”,AinShamsEngineeringJournal,vol.5 Elsevier,Issue4,pp1093-1113,201 8http://ai.stanford.edu/~amaas/data/sentiment-Datasetconsideredfor classification.

Copyright

Copyright © 2025 Yogesh Ramaswamy. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET68593

Publish Date : 2025-04-09

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here