Presenting a comprehensive analytical framework for analyzing and forecasting supermarket sales data with the goal of understanding the seasonal sales trends and customer approach towards the product to improve overall business performance. The methodology begins with data preprocessing and cleaning to ensure quality, integrity and reliability towards the data which is followed by Exploratory Data Analysis (EDA) to uncover sales trends, seasonal patterns, and product and customer anomalies. Product price analysis with respect to different countries was conducted to detect customer approach. Customers are sectioned together using customer repetition analysis, K-means clustering with Elbow method, and customer lifetime value models to categorize customers into low, medium, and high-value customers. Market Basket Analysis with association mining rule to identify products which are collectively co-purchased. Ultimately in the end, sales were forecasted using Seasonal ARIMA (SARIMA) to predict future revenue to inform critical business decisions. The proposed system provides actionable insights analyzing the potential growth of a business on the surface of the global market.
Introduction
The text presents a data-driven retail analytics framework that applies data science and machine learning techniques to improve business decision-making using global sales data, with a comparative focus on France and the Netherlands. Although France has a much larger customer base, the Netherlands generates higher total sales revenue, highlighting clear differences in purchasing behavior, spending patterns, and product preferences between the two countries.
The framework begins with data preprocessing and cleaning, followed by exploratory data analysis (EDA) to identify trends, seasonality, correlations, and anomalies. Product and price analysis reveals that Dutch customers tend to purchase higher-priced items, contributing to higher revenue despite fewer customers. Time-based and country-wise analyses further show that the Netherlands exhibits higher average monthly sales and stronger seasonal peaks, while France shows more stable but lower revenue patterns.
To understand customer behavior, the study applies RFM analysis, K-means clustering (with the Elbow method), and Customer Lifetime Value (CLV) to segment customers into low-, medium-, and high-value groups. Results indicate strong customer loyalty in both countries, with a small number of Dutch customers—particularly one highly active customer—contributing disproportionately to total revenue.
The framework also incorporates Market Basket Analysis using the Apriori algorithm to identify frequently co-purchased products and derive association rules based on support, confidence, and lift. Finally, SARIMA time-series models are used to forecast future sales by capturing trends and seasonal patterns.
Overall, the integrated analytics approach uncovers hidden patterns in customer behavior, product performance, and regional sales differences, enabling accurate sales forecasting and supporting data-driven strategies for marketing, inventory management, and revenue optimization.
Conclusion
This paper presents a comprehensive study of Data Analysis and Forecasting on Supermarket Sales Transactions, where multiple analytical techniques were applied to extract meaningful insights and predict future sales behavior. Through Exploratory Data Analysis (EDA), the dataset was examined to uncover hidden patterns, identify anomalies, and summarize key attributes using statistical methods and visualization techniques. A country-wise comparison revealed that the Netherlands, despite having only 9 unique customers compared to France’s 87, generated significantly higher total revenue, highlighting the impact of high-value, loyal customers. Further investigation into product price comparisons showed that the majority of top products in the Netherlands carried higher unit prices than those in France, reinforcing the revenue disparity. The computation of Customer Lifetime Value (CLV) provided a measure of long-term profitability, where the Netherlands exhibited customers with far superior CLV scores due to high purchase frequency and bulk ordering, while France displayed broader but lower-value spending behavior. Market Basket Analysis (MBA) and the Apriori algorithm were employed to detect strong product associations, enabling recommendations for product bundling and shelf placement to further drive sales. Finally, time-series forecasting using SARIMA was performed to model seasonal trends and predict future sales, producing accurate and reliable results that can guide inventory management and business planning. Altogether, the study concludes that advanced data-driven approaches such as EDA, CLV, clustering, MBA, and SARIMA forecasting can significantly improve business understanding, enhance customer targeting strategies, optimize product placements, and support robust forecasting for better decision-making in supermarket sales.
References
[1] Erni Widiastuti, Jani Kusanti, Herwin Sulistyowati, Destian Firnanda, Septian Muhamad Riski, Rizki Adhi Pratama: “Exploratory Data Analysis to Improve the Performance of Convolutional Neural Networks in Online Sales Product Image Prediction”, IEEE, 2024.
[2] KangHui Ying, WenYu Hu, Jin Bo Chen, Guo Nong Li: “Research on Instance-Level Data Cleaning Technology”, IEEE, 2021.
[3] M. V. Jerabandi, Keerti Rayaraddi, Achala Shirol, Sneha Mugalkod, Sushmita Tirlapur: “Analysis and Visualization of Supermarket Data using Data Science Techniques”, International Research Journal in Global Engineering and Sciences (IRJGES), May 2021.
[4] G. S. Ramesh, T. V. Rajini Kanth, D. Vasumathi: “Analysis of Location Based Sales Data using Machine Learning Algorithms”, International Journal on Emerging Technologies, February 2020.
[5] Yutao Li: “Application of Tableau in Visual Analysis Data of a US Supermarket Sales”, IEEE, 2022.
[6] Zhao Mei, Li Mingjie: “Research on Supermarket Marketing Data Analysis Based on Business Intelligence”, IEEE, 2023.
[7] Seyed Mojtaba Miri, Zohreh Dehdashti Shahrokh: “A Short Introduction to Comparative Research”, Conference Paper, May 2019.
[8] Pranavi Satheesan, Prasanna S. Haddela, Jesuthasan Alosius: “Product Recommendation System for Supermarket”,December 2020.
[9] Shuming Wang, Phisanu Chiawkhun: “Using data visualization for supermarket retail analysis”.
[10] Aman Banduni, Ilavendhan A.: “Customer Segmentation Using Machine Learning”, IJCRT, 2020.
[11] Marcin Majka: “Implementing Customer Segmentation in Marketing”, 2024.
[12] Bharghav Madhiraju, Sukesh Reddy, Dr. G Sasikala:” CUSTOMER SEGMENTATION USING RFM ANALYSIS”, EPRA International Journal of Economic and Business Review-Peer Reviewed Journal , July 2024.
[13] Mussadiq Abdul Rahim, Muhammad Mushafiq, Salabat Khan, Zulfiqar Ali Arain:”RFM-based repurchase behavior for customer classification and segmentation”, July 2021.
[14] Pritika Talwar, Shubham, Komalpreet Kaur: “EXPLORING CLUSTERING TECHNIQUES IN MACHINE LEARNING”, IJCRT, March 2024.
[15] Kamalpreet Bindra, Anuranjan Mishra: “A Detailed Study of Clustering Algorithms”, IEEE, 2018.
[16] Mohiuuddin Ahmed, Raihan Seraj, Syed Mohammed Shamsul Islam: “The k-means Algorithm: A Comprehensive Survey and Performance Evaluation”, MDPI, May 2020.
[17] Ms. Sarika Rathi, Prof. Vijay Karwande: “Review Paper on Customer Segmentation Approach Using RFM and K-Means Clustering Technique”, International Journal of Creative Research Thoughts (IJCRT), December 2022.
[18] Km Vandna, Mr.Pawan Yadav, Mr.Vinod Kumar: “Elbow Method for Optimal Customer Segmentation Using K-Means Clustering”, International Journal of Scientific Research and Engineering Development, May-June 2024.
[19] Darshan Anil Jethwa, Siya Milind Khamkar, Anish Anand Pachchhapur, Snehal Kulkarni. “Customer Segmentation Analysis using K-means Algorithm with Elbow Method and Dendrogram.” IEEE, 2024.
[20] Fitri Marisa, Arie Restu Wardhani, Wiwin Purnomowati, Anik Vega Vitianingsih, Anastasia L Maukar, Erri Wahyu Puspitarini: “ POTENTIAL CUSTOMER ANALYSIS USING K-MEANS WITH ELBOW METHOD”,September 2023.
[21] Ms. Ramamani Venkatakrishna, Mr. Pradeempta Mishra, Ms. Sneha P Tiwari. \"Customer Lifetime Value Prediction and Segmentation using Machine Learning.\" International Journal of Research in Engineering and Science (IJRES), August 2021.
[22] Mitra Bokaei Hosseni, Mohammad Jafar Tarokh. \"Customer Segmentation Using CLV Elements.\" Journal of Service Science and Management, 2011.
[23] Venkata Harini, G. Venu, G. Vijay Kiran Reddy, et al. \"Market Basket Analysis.\" International Journal for Research in Applied Science & Engineering Technology (IJRASET), May 2024.
[24] Edwin Omol, Dorcas Onyango, Lucy Mburu, Paul Abuonji: “ Apriori Algorithm and Market Basket Analysis to Uncover Consumer Buying Patterns: Case of a Kenyan Supermarket”,
[25] Himanshu Singh, Nikhil Shelke, Aniket Bavaskar, Shradha Nikam, Prof. Pradip Shewale, Prof. Deepa Mahajan: “Study on Market Basket Analysis with Apriori Algorithm Approach”, International Research Journal of Engineering and Technology (IRJET), May 2021.
[26] Pooja Ghude, Mansi Padekar, Pradip Alam and Dr. Savita Sangam: “SALES FORECASTING PREDICTION USING MACHINE LEARNING”, JETIR, June 2024.
[27] Yasaman Ensafi, Saman Hassanzadeh Amin, Guoqing Zhang, Bharat Shah: “Time-series forecasting of seasonal items sales using machine learning – A comparative analysis”, International Journal of Information Management Data Insights, April 2022.
[28] Zhenyu Liu, Zhengtong Zhu, Jing Gao, Cheng Xu: “Forecast Methods for Time Series Data: A Survey.” IEEE , June 2021.
[29] Peng Chen, Aichen Niu, Duanyang Liu, Wei Jiang, Bin Ma: “Time Series Forecasting of Temperatures using SARIMA: An Example from Nanjing.” IOP Conference Series: Materials Science and Engineering, August 2018.
[30] P. Kabbilawsh, D. Sathish Kumar, N. R. Chithra: “Forecasting long-term monthly precipitation using SARIMA models.” Journal of Earth System Science, Indian Academy of Sciences, March 2022.
[31] Sardar Usman, M. Usman Ashraf, Asad Hayat: . \"Predictive Analysis of Retail Sales Forecasting using Machine Learning Techniques.\" Lahore Garrison University Research Journal of Computer Science and Information Technology, February 2023.
[32] Malde Ritik Vimal, Shaikh Mohammad Bilal Naseem: \"Time Series Analysis: Forecasting with SARIMAX Model and Stationarity Concept.\" Journal of Emerging Technologies and Innovative Research (JETIR), December 2020.
[33] Georgia A Papacharalampous, Hristos Tyralis: \"One-step ahead forecasting of annual precipitation and temperature using univariate time series methods (solicited).\" European Geosciences Union General Assembly, April 2018.
[34] Gajendra Thakur, Anup Masurkar, Deepa Padwal: \"A Review Of Superstore Sales And Customer Feedback Analysis Using Data And Information Visualization.\", International Journal of Creative Research Thoughts (IJCRT), October 2024.