The rapid increase in the number of e-commerce platforms has resulted in a massive amount of customer feedback data, particularly in the cosmetic industry, where customer preferences and satisfaction levels are constantly fluctuating. The manual analysis of such large-scale data is inefficient and time-consuming, thereby requiring automated and intelligent analysis tools. This project proposes a machine learning-based solution to unravel customer sentiment and product performance in cosmetic products based on numerical review data.
Unlike the traditional sentiment analysis technique, which relies on text-based customer reviews, the proposed system is based on structured numerical attributes such as product ratings, number of reviews, sentiment scores, popularity scores, and weighted product scores obtained by feature engineering. The data set undergoes rigorous preprocessing, including data cleaning, normalization, and outlier treatment, to ensure data integrity and authenticity.
To identify hidden patterns and similarities among cosmetic products, K-Means clustering, an unsupervised machine learning algorithm, is used to cluster products based on customer feedback patterns. The clustering algorithm allows efficient product segmentation, which in turn helps to identify high-performing, medium-performing, and low-performing product segments. Visualization tools such as cluster plots and distribution plots are utilized to analyze the results of the clustering algorithm and facilitate analytical tasks.
The experimental results show that the proposed approach is capable of identifying the trends of customer sentiment and the popularity of products without using text data. The proposed system can be used to improve the recommendation system of e-commerce websites and cosmetic companies to understand the preferences of customers. The proposed approach is also scalable to be used in other fields where the customer feedback is available in numerical form.
Introduction
The growth of e-commerce platforms has generated large amounts of customer feedback data, which helps businesses understand customer preferences and product performance. In the cosmetic industry, customer feedback strongly influences purchasing decisions, brand reputation, and product success. Traditional sentiment analysis methods mainly rely on text-based reviews using natural language processing (NLP). However, these approaches face challenges such as language dependency, noisy data, data sparsity, and limited availability of structured textual reviews. In many e-commerce platforms, customer feedback is often available in numerical formats such as ratings, review counts, and engagement metrics rather than text.
To address this limitation, the study proposes a machine learning-based framework that analyzes customer sentiment and product popularity using numerical review data. The framework uses feature engineering techniques to compute sentiment scores, popularity scores, and weighted product scores that represent customer satisfaction and product performance. Instead of relying on labeled text data, the study uses K-Means clustering, an unsupervised learning method, to group cosmetic products based on similarities in customer feedback.
The methodology involves several steps, including data collection from an e-commerce dataset, data preprocessing (removing duplicates, handling missing values, and normalization), feature engineering, clustering analysis, and visualization of results. Sentiment score is derived from product ratings, popularity score from engagement metrics and review counts, and a weighted product score combines both to reduce rating bias.
Using the Elbow Method, the optimal number of clusters was determined as K = 3, and the K-Means algorithm grouped products into three categories: high-performing products with high sentiment and popularity, moderately performing products with average feedback, and low-performing products with low satisfaction and engagement. Visualization techniques confirmed clear separation among these clusters.
The results show that meaningful sentiment insights can be obtained without using textual reviews, making the approach scalable and efficient for large e-commerce datasets. The framework helps businesses identify product trends, perform market segmentation, and support decision-making in the cosmetic industry. Although the method does not capture detailed opinions from text reviews and depends on cluster parameters, it provides a simple, interpretable, and scalable solution for product analysis based on numerical customer feedback.
Conclusion
This research work has demonstrated an efficient machine learning-based framework for the analysis and segmentation of cosmetic products using numerical customer feedback data collected from e-commerce sites. The developed framework has been able to overcome the difficulty of obtaining insights from the absence of text-based customer reviews by utilizing structured numerical attributes such as ratings, sentiment scores, and popularity indices.
By performing rigorous data preprocessing and feature engineering, the framework has been able to convert raw data into meaningful representations that are amenable to unsupervised machine learning. The K-Means clustering algorithm, along with the Elbow Method for selecting the optimal number of clusters, has been able to effectively segment cosmetic products into meaningful clusters. The clustering has been able to effectively separate high-performing, average-performing, and low-performing cosmetic products according to customer sentiment and engagement.
The experimental results and visual inspection have shown that the proposed methodology is scalable and computationally efficient in product analysis while being interpretable. The resulting clusters are very informative and can be used to support business decisions, such as product recommendation, marketing strategy optimization, and performance assessment.
Conclusion
The proposed work has confirmed that numerical sentiment representation and popularity-aware clustering are a reliable alternative to text-based sentiment analysis for large-scale e-commerce product analysis. The proposed framework can be easily adapted to other product domains and recommendation systems.
References
[1] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. San Francisco, CA, USA: Morgan Kaufmann, 2012.
[2] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Burlington, MA, USA: Morgan Kaufmann, 2017.
[3] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed. New York, NY, USA: Springer, 2009.
[4] A. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010.
[5] S. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982.
[6] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, 1967, pp. 281–297.
[7] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987.
[8] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, NJ, USA: Wiley, 2009.
[9] B. Liu, Sentiment Analysis and Opinion Mining. San Rafael, CA, USA: Morgan & Claypool, 2012.
[10] C. C. Aggarwal, Machine Learning for Text. Cham, Switzerland: Springer,2018.
[11] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009.
[12] G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 734–749, 2005.
[13] J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 2014.
[14] R. Xu and D. Wunsch, “Survey of clustering algorithms,” IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645–678, 2005.
[15] S. Dasgupta and Y. Freund, “Random projection trees and low dimensional manifolds,” in Proc. 40th ACM STOC, 2008, pp. 537–546.
[16] A. Ng, M. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” Neural information Processing System,2002,pp.849-856.