The exponential growth of e-commerce platforms, particularly Amazon, has resulted in a massive volume of customer-generated reviews, making manual sentiment analysis both time-consuming and inefficient. This study proposes an automated system to predict the sentiment polarity (positive or negative) of Amazon product reviews using a combination of machine learning and deep learning techniques. The reviews are processed and converted into vectorized representations using methods such as Bag of Words (Bow), TF-IDF, Word2Vec, and their respective unigrams, bigram and weighted variants. A variety of classifiers—including Naive Bayes, Logistic Regression, Support Vector Machines (SVM), Random Forest, and Long Short-Term Memory (LSTM) networks—are trained and evaluated. Experimental results indicate that Logistic Regression with bigram TF-IDF achieves the highest test AUC of 0.9615, demonstrating superior generalization. While LSTM models show potential in capturing sequential dependencies, they require further optimization. The study emphasizes the importance of automated sentiment classification and outlines future directions such as multilingual analysis and real-time deployment using cloud platforms.
Introduction
This research explores the use of Natural Language Processing (NLP) and sentiment analysis to predict the sentiment polarity (positive or negative) of Amazon product reviews. With the growth of e-commerce, accurately analyzing customer feedback is crucial for businesses to improve products and customer satisfaction.
The study utilizes both structured data (ratings, metadata) and unstructured textual data from a large dataset of over 568,000 reviews collected between 1999 and 2012. Reviews with ratings 4 or 5 are classified as positive, and 1 or 2 as negative, with neutral reviews excluded.
To predict sentiment, the research employs traditional supervised machine learning models—Naive Bayes, Logistic Regression, Support Vector Machine (SVM), and Random Forest—using text features such as Bag of Words, TF-IDF, and Word2Vec embeddings. Additionally, a deep learning model, Long Short-Term Memory (LSTM), a type of Recurrent Neural Network, is applied to capture contextual and sequential information from review texts.
The literature review highlights previous studies on review helpfulness and sentiment, noting factors influencing review impact like length, sentiment neutrality, and semantic richness. Challenges addressed include automating sentiment classification and managing large volumes of reviews.
Preprocessing steps involve cleaning text data (removing HTML tags, punctuation, stop words), stemming, and vectorizing text for model input. Models are evaluated using metrics like accuracy and AUC-ROC, with hyperparameter tuning performed for optimal results.
Initial findings indicate that Naive Bayes combined with TF-IDF features achieved a Test AUC up to 0.92. The LSTM model showed superior performance in capturing complex sentiment nuances across longer reviews.
Conclusion
The comparative evaluation of Naive Bayes, Logistic Regression, SVM, Random Forest, and LSTM models reveals diverse patterns of performance depending on the vectorization strategy used. Naive Bayes, when used with Bag of Words, showed a reliable Test AUC of 0.9036 without overfitting, indicating solid performance in text-based tasks. Logistic Regression excelled when paired with bigram TF-IDF vectors, achieving a Test AUC of 0.9615, and also demonstrated high robustness with TF-IDF (0.9546). Although Average Word2Vec-based embeddings provided decent results (~0.91, 0.92 AUC), their performance slightly lagged in count-based and skip gram based Word2Vec TF-IDF models in this setup.
SVM models, known for their margin-based classification, delivered strong generalization, especially with bigram TF-IDF (Test AUC of 0.9476) and unigram TF-IDF (0.9449), confirming their effectiveness in high-dimensional sparse spaces. Word2Vec-based features with SVM also performed well but were slightly behind in Test AUC compared to TF-IDF approaches.
Random Forest classifiers achieved excellent Train AUCs across the board, indicating strong learning capability, but some configurations showed signs of overfitting, particularly with Word2Vec embeddings (Train AUC ? 1.0 vs Test AUC < 0.92). The best performance from Random Forest was with TF-IDF vectors (Test AUC of 0.9215) and bigram Bow (0.9196), confirming the strength of count-based representations in tree-based models.
Introducing LSTM, a deep learning model designed for sequence learning, significantly enhanced performance, particularly in capturing contextual dependencies and nuanced sentiment expressions. With learned embeddings and dropout regularization, LSTM achieved Test AUCs of 0.9312 and 0.9296 for 100 and 70 LSTM units respectively, outperforming all classical models. This demonstrates the advantage of deep learning in handling complex sentence structures and long-term dependencies in textual data.
To push the boundaries further, integrating transformer-based models like BERT could yield even more contextually aware predictions due to its bidirectional attention mechanism, which has shown state-of-the-art performance in numerous NLP tasks. Expanding the current system to handle multilingual and code-mixed reviews would improve adaptability in global markets. Additionally, deploying these models into real-time sentiment analysis dashboards can provide actionable insights to e-commerce businesses, enabling dynamic customer feedback analysis, product improvement, and enhanced decision-making.
References
[1] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis.
[2] Mikolov, T., et al. (2013). Distributed Representations of Words and Phrases and their Compositionality.
[3] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory.
[4] Akanksha Halde, Aditi Uttekar & Amit Vishwakarma, “sentiment analysis on amazon product reviews” , Volume:04/Issue:04/April-2022, e-ISSN: 2582-5208
[5] Kartikay Thakkar, Sidharth Sharma, Ujjwal Chhabra & Asst. Prof. Ms. Charu Gupta, “sentimental analysis on amazon fine food reviews”, Journal of International Journal of Scientific Research & Engineering
[6] Lilleberg, J., Zhu, Y. and Zhang, Y., 2015, July. Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC) (pp. 136-140). IEEE.
[7] Sokolova, M. and Lapalme, G., 2009. A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), pp.427-437
[8] Hsu, C.W., Chang, C.C. and Lin, C.J., 2003. A practical guide to support vector classification.
[9] Salehan, M. and Kim, D.J., 2016. Predicting the performance of online consumer reviews: A sentiment mining approach to big data analytics. Decision Support Systems, 81, pp.30-40.
[10] Challenges for NLP Frameworks. pp. 45-50. Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en.
[11] Rong, X., 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.