Sentiment analysis has become a very important tool for understanding customer sentiments, brand perception, and market trends from text data. This project is centered on conducting sentiment analysis of Amazon product review datasets using machine learning classifiers and optimization methods for improving the accuracy of predictions. In the current work, we utilize two main models: Random Forest Classifier and a Decision Tree Classifier optimized using Grey Wolf Optimization (GWO).
The dataset used contains product reviews, from which the review text and their corresponding scores were extracted. Reviews were preprocessed by lowercasing text, stripping of punctuation, and stopwords removal to achieve cleaner inputs for models. A binary classification approach was taken, assigning a positive label to reviews with a score higher than three, and a negative label to those with a score three and lower. In order to transform the text data into a machine learning algorithm-friendly format, the TF-IDF (Term Frequency-Inverse Document Frequency) method was used, which captures the significance of words in relation to the dataset.
The Random Forest Classifier, an ensemble learning algorithm that builds multiple decision trees and returns the mode of their predictions, was the first baseline model. It showed strong performance because it could minimize overfitting and deal with high-dimensional data well. To investigate optimization methods further, a Decision Tree Classifier was trained whose hyperparameters, namely the maximum depth, were optimized using Grey Wolf Optimization. GWO, which is a nature-inspired metaheuristic algorithm based on the hunting behavior of grey wolves, is a simulation of the leadership structure and cooperative hunting approach of wolves to discover optimal solutions. Our findings indicated that the Random Forest model had high classification accuracy with little tuning. Yet, the Decision Tree model, when optimized through GWO, proved to be competitive, highlighting the power of metaheuristic optimization methods in improving conventional machine learning models. Confusion matrices and classification reports were used to give deeper insights into the precision, recall, and F1-scores of each model. This investigation showcases the effectiveness of using traditional classifiers in conjunction with intelligent optimization algorithms for sentiment classification tasks. It further highlights the criticality of preprocessing steps and feature extraction methods such as TF-IDF in deciding the overall efficacy of the models. Although the present work was mostly concerned with the optimization of one hyperparameter through GWO, future research could investigate multi-parameter optimization and comparison with other swarm intelligence algorithms such as Particle Swarm Optimization (PSO) or Genetic Algorithms (GA) in order to enhance performance further. In general, the project is a holistic method of addressing sentiment analysis on real product review data using a combination of ensemble approaches, decision trees, and evolutionary optimization.
Introduction
This project focuses on sentiment analysis, a Natural Language Processing (NLP) task that determines the emotional tone (positive, negative, or neutral) of text data. With the explosive growth of product reviews on platforms like Amazon, manual analysis becomes impractical. The proposed system uses machine learning, specifically Random Forest (RF) and a Grey Wolf Optimization (GWO)-tuned Decision Tree, to classify customer reviews efficiently and accurately.
???? Objectives
Preprocess Amazon review data for effective analysis.
Convert text into numerical features using TF-IDF.
Use Random Forest as a baseline model.
Enhance a Decision Tree classifier with GWO for hyperparameter optimization (e.g., max_depth).
Compare model performance using accuracy, precision, recall, F1-score, and confusion matrices.
Explore how metaheuristic optimization can improve machine learning model performance.
???? Problem Statement
Manual analysis of reviews is unrealistic due to the massive data scale. Traditional classifiers suffer from suboptimal parameter tuning, affecting performance. This work aims to improve sentiment classification using Random Forest and a GWO-optimized Decision Tree, enabling more efficient, accurate, and scalable sentiment analysis.
???? Methodology Overview
Data Collection & Preprocessing
Load Amazon product review dataset.
Clean text: lowercase conversion, punctuation removal, stop word removal, tokenization.
Feature Extraction (TF-IDF)
Transform cleaned text into numerical vectors to highlight important terms in the context of sentiment.
Model Training
Train Random Forest as a baseline classifier.
Apply Grey Wolf Optimization (GWO) to tune the max_depth parameter of a Decision Tree, simulating the hunting behavior of wolves to find optimal values.
Evaluation Metrics
Compare RF and GWO-optimized Decision Tree using:
Accuracy
Precision
Recall
F1-score
Use confusion matrices for detailed visual analysis of model performance.
Ensemble and hybrid approaches (e.g., combining rule-based and ML models) outperform single models.
Deep learning models like RNNs with LSTM achieve state-of-the-art accuracy (up to 97.5%).
Optimization techniques like GWO, SentiWordNet, and fuzzy logic improve classifier performance.
Most research emphasizes the growing role of automation in handling large review datasets.
???? Thesis Structure
Introduction – Overview, goals, background.
Literature Review – Discussion of related studies and sentiment classification methods.
Methodology – Data preprocessing, TF-IDF, model training, and optimization.
Results – Model comparisons, accuracy metrics, confusion matrices.
Discussion – Performance insights and limitations.
Conclusion & Future Work – Suggestions like:
Multi-parameter optimization
Integration with deep learning
Real-time sentiment systems
Multilingual sentiment analysis
Conclusion
1) Project Overview
The task was to create an efficient sentiment analysis system based on Random Forest and Grey Wolf Optimization (GWO)-improved Decision Tree models. Preprocessing and TF-IDF feature extraction well prepared the dataset for machine learning.
2) Model Performance
Random Forest yielded a robust baseline with high accuracy and consistency in repeated runs. Decision Tree, when optimized using GWO, displayed significant enhancements in precision, recall, and F1-score over its non-optimized version.
3) Scope of improvements
Future research may include tuning other hyperparameters, experimentations with different datasets, and incorporating more complex models such as BERT to enhance semantic perception. This will make our study even more valuable.
References
[1] S. Wassana, Xi Chenb, T. Shenc, M. Waqard, and N. Z. Jhanjhie, \"Amazon Product Sentiment Analysis using Machine Learning Techniques,\" Revista Argentina de Clínica Psicológica, 2021.
[2] A. Dadhich and B. Thankachan, \"Sentiment Analysis of Amazon Product Reviews Using Hybrid Rule-Based Approach,\" in Proceedings of Conference, 2021.
[3] J. C. Gope, T. Tabassum, M. M. Mabrur, K. Yu, and M. Arifuzzaman, \"Sentiment Analysis of Amazon Product Reviews Using Machine Learning and Deep Learning Models,\" in IEEE, 2022.
[4] C. Chauhan and S. Sehgal, \"Sentiment analysis on product review,\" in 2017 International Conference on Computing, Communication and Automation (ICCCA), IEEE, 2017.
[5] S. Mukherjee and P. Bhattacharyya, \"Feature Specific Sentiment Analysis for Product Reviews,\" in International Conference on Intelligent Text Processing and Computational Linguistics, 2012.
[6] M. D. Devika, C. Sunitha, and A. Ganesh, \"Sentiment Analysis: A Comparative Study on Different Approaches,\" Procedia Computer Science, vol. 87, 2016.
[7] G. Vinodhini and R. M. Chandrasekaran, \"Sentiment Analysis and Opinion Mining: A Survey,\" International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE), vol. 2, no. 6, 2012.
[8] Rahul, V. Raj, \"Sentiment Analysis with Product Reviews Using Machine Learning and Lexicon-Based Approaches,\" in IEEE, 2019.
[9] Z. Singla, S. Randhawa, \"Statistical and sentiment analysis of consumer product reviews,\" in IEEE, 2017.
[10] R. S. Jagdale, V. S. Shirsat, and S. N. Deshmukh, \"Sentiment Analysis on Product Reviews Using Machine Learning Techniques,\" Conference Paper, First Online: Aug. 12, 2018.