This research analyzes the use of Natural Language Processing (NLP) with supervised machine learning methods to classify sentiments in Amazon product reviews. We applied extensive text preprocessing and transformed the dataset through TF-IDF vectorization of the Datafiniti Amazon Consumer Reviews dataset. A Linear Support Vector Classifier was used to determine the emotional classification of sentiments as Positive, Neutral, or Negative. The performance of the machine learning model is represented as evaluation results using confusion matrices as well as a classification report, following very good results for imbalanced data. In addition, the Power BI-generated trends over different product categories for sentiment visualization made the results quite interpretable and practically actionable for real-world e-commerce applications.
Introduction
The study investigates the application of sentiment analysis and data visualization to Amazon product reviews, aiming to derive actionable insights from large-scale e-commerce datasets. By employing a Linear Support Vector Classifier (Linear SVC) and integrating Power BI for visualization, the research classifies reviews into Positive, Neutral, and Negative sentiments based on textual content and star ratings.
Key Highlights:
Dataset Overview: The analysis utilizes the Datafiniti Amazon Consumer Reviews dataset from May 2019, comprising approximately 28,000 reviews across various product categories. Each entry includes the review text, star rating (1–5), product name, category, and review date.
Preprocessing Steps: The dataset underwent several preprocessing stages:
Removal of rows with missing review text or ratings.
Text normalization, including lowercasing, elimination of HTML tags, punctuation, stopwords, and extra spaces.
Tokenization and lemmatization of review text.
Sentiment labeling based on star ratings:
1 star → Negative
2–3 stars → Neutral
4–5 stars → Positive
Application of TF-IDF vectorization to convert text into numerical features for model training.
Model Training: A Linear SVC model was trained using the preprocessed data. The dataset was split into 80% training and 20% testing subsets. Evaluation metrics included accuracy, precision, recall, F1 score, and confusion matrix. The model achieved an accuracy of 93.51%, with precision, recall, and F1 scores of 92.8%, 92.4%, and 92.6%, respectively.
Visualization: Sentiment predictions were exported to Power BI, where interactive dashboards were created to visualize:
Sentiment distribution across product categories.
Average sentiment scores per product.
Time-based trends in customer satisfaction.
Top positive and negative reviews by product.
Star rating summaries using visual indicators.
Conclusion
Conclusions summarized: The research study has extensively examined product reviews with applying sentiment analysis for Linear Support Vector Classifier (Linear SVC) model at Amazon. Application of Natural Language Processing models has classified user reviews into positive, negative, neutral based on review contents and star ratings associated with them.
The findings are summarized: The research has focused on the application of sentiment analysis on product reviews using the Linear Support Vector Classifier (Linear SVC) model at Amazon.
The classified Natural Language Processing techniques have classified the user reviews into three categories, namely positive, neutral, and negative, based on the contents-referred reviews and star ratings attached to them. With this understanding, customer perceptions, satisfaction levels, and overall product performance could be assessed in a data-driven way.
Our approach has been credible in the presence of this large-scale unstructured review data. This was the performance of the Linear model SVC in sentiment prediction as evaluated by metrics and confusion matrix. In addition, the state-of-the-art elaboration of the interpretation of the insight for the dataset tone through the sentiment distribution chart will also avail the viewer with a user-friendly presentation of results. However, the dashboard in Power BI is currently under development, and it will improve the analysis by converting model outputs into dynamic and attractive formats. Decision-makers, marketers, and business analysts would benefit significantly from these dashboards, especially when it comes to actionable insights, product trends, and customer satisfaction improvement.
In short, this research confirms that machine learning and sentiment analysis can be combined to draw meaningful insights from user-generated content. Such techniques are becoming exceptional in this e-commerce-oriented world today, where consumers give feedback with increasing frequency, the most part of which remains unutilized.
References
[1] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135. https://doi.org/10.1561/1500000011
[2] Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093–1113. https://doi.org/10.1016/j.asej.2014.04.011
[3] Torii, M., et al. (2021). Mining health-related information from Amazon product reviews on over-the-counter drugs. JMIR Medical Informatics, 9(4), e25630. https://doi.org/10.2196/25630
[4] Boiy, E., & Moens, M. F. (2009). A machine learning approach to sentiment analysis in multilingual Web texts. Information Retrieval, 12, 526–558. https://doi.org/10.1007/s10791-008-9070-z
[5] Liu, B. (2012). Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
[6] He, W., Zha, S., & Li, L. (2013). Social media competitive analysis and text mining: A case study in the pizza industry. International Journal of Information Management, 33(3), 464–472. https://doi.org/10.1016/j.ijinfomgt.2013.01.001