Customer Sentiment Analysis from Big Data

Authors: Abdul Wahid Ansari, Sahan Ahmad Khan

DOI Link: https://doi.org/10.22214/ijraset.2026.83504

Abstract

With advancement in technology and proliferation of digital interfaces, a massive amount of unstructured data, such as opinions, social media interactions, feedback and online behaviours are produced by the customers. This makes the use of sentiment analysis a promising tool in modern organisations which can be used effectively in decision making. This paper will put forward a comprehensive analysis of the application of customer sentiment analysis using big data technology. This paper will demonstrate a data processing scheme which can take advantage of the intelligent text processing techniques for the large scale extraction of the prominent sentiment information from the unstructured data. Experiments show the proposed approach to be of better efficacy, scalability, and processing speed compared with traditional methods of sentiment analysis. This paper shows the importance of the use of big data-driven techniques in the application of customer sentiment analysis in decision support system.

Introduction

1) Machine Learning Models, LIME/SHAP, and Explanation Consistency (ECS)

The text discusses how complex machine learning models (e.g., Logistic Regression, Random Forest, XGBoost) are increasingly used in critical domains like healthcare, but their “black-box” nature makes explanations difficult to interpret and trust. Explainable AI (XAI) methods such as LIME and SHAP are used to interpret model predictions, but they differ in approach—LIME builds local surrogate models while SHAP uses game-theory-based feature attribution.

The paper highlights a gap: there is no standard way to measure whether different explanation methods agree. To address this, it proposes an Explanation Consistency Score (ECS) that combines Jaccard similarity (feature overlap) and Spearman rank correlation (feature importance ranking agreement). The study evaluates LIME and SHAP across multiple models on a heart disease dataset and finds that more complex models often produce less consistent explanations. It also introduces a CA-XAI framework to evaluate and compare explanation methods systematically.

2) ML-Based Crop Recommendation for Odisha Agriculture

This study focuses on agriculture in Odisha, India, especially vulnerable districts like Koraput and Rayagada, where farming is heavily affected by climate variability, poor infrastructure, and reliance on rain-fed agriculture. Traditional agricultural advisory systems are limited due to lack of real-time, location-specific data.

To address this, the paper proposes an AI/ML-based decision support system that integrates soil, climate, satellite, and yield data. It builds a geospatial database and applies multiple ML models (Random Forest, SVM, KNN, ensemble methods, etc.) to predict crop suitability and provide recommendations for crops, fertilizers, and irrigation.

The system shows high performance, with ensemble models like Random Forest achieving ~99.5% accuracy. Feature importance analysis shows rainfall, humidity, and soil nutrients as key drivers. The system demonstrates improved decision-making compared to traditional advisory methods and can increase agricultural productivity.

3) Agentic AI for Fixed Broadband Service Assurance

The text explains how fixed broadband networks have become more complex due to FTTH, Wi-Fi dependence, and rising customer expectations. Traditional service assurance in telecom is reactive, relying on manual troubleshooting and fragmented systems, leading to delays and inefficiencies.

AI and ML have improved predictive analytics in telecom, but most systems remain advisory. The emergence of Agentic AI—systems capable of reasoning, planning, tool use, and multi-agent collaboration—offers a shift toward autonomous network operations.

The paper identifies key gaps: limited focus on fixed broadband (vs mobile networks), isolated AI tools that don’t work together, and lack of frameworks combining ML, LLMs, and automation with governance. It proposes a multi-layer Agentic AI framework integrating data systems, ML models, LLM reasoning, multi-agent orchestration, and human oversight to enable proactive, intelligent NOC operations.

4) Hybrid Deep Learning Framework for Glaucoma Detection

This study addresses glaucoma, a major cause of irreversible blindness that often goes undetected early. Traditional diagnosis relies on expert interpretation of retinal fundus images, particularly the Cup-to-Disc Ratio (CDR), but this process is subjective and resource-intensive.

The proposed system introduces a multi-stage hybrid AI framework:

Deep feature extraction using EfficientNetB3 and DenseNet121
Optic disc and cup segmentation using U-Net
CDR computation as a clinical biomarker
Feature fusion with LightGBM classifier for final prediction

This hybrid approach improves both accuracy and interpretability by combining deep learning with anatomical measurements. The system is deployed via a user-friendly interface (Streamlit) and evaluated using metrics like accuracy, precision, recall, F1-score, and AUC-ROC.

5) Big Data-Based Customer Sentiment Analysis

This paper addresses the explosion of customer feedback on digital platforms and the challenges of analyzing large-scale, unstructured sentiment data. Traditional sentiment analysis methods struggle with scalability, noise, sarcasm, and real-time processing.

The study proposes a big data-driven sentiment analysis framework using distributed systems (e.g., Hadoop/Spark) combined with machine learning and deep learning models (including Bi-LSTM). The process includes:

Data collection (e.g., Amazon reviews dataset)
Preprocessing (cleaning, tokenization, lemmatization)
Feature extraction (TF-IDF and semantic features)
Sentiment classification (positive, negative, neutral)
Visualization and trend analysis

Results show that Bi-LSTM with batch normalization improves performance (≈84.1% accuracy) and reduces overfitting. However, challenges like scalability, data noise, and model generalization remain.

Conclusion

This research has addressed the increasing demand for effective customer sentiment analysis, specifically in the context of the rapid growth of big data. The rapid growth of the amount of content generated by customers through various digital media has meant that some traditional methods of conducting a sentiment analysis are no longer going to meet the demands of speed and scalability and more accurately represent customer sentiment. The systematic review of current techniques for performing a sentiment analysis has highlighted areas of concern for how to manage large-scale, unstructured, and \'noisy\' customer sentiment data. A framework has been presented that demonstrates how to combine big data technology with machine learning techniques to create a highly scalable framework for performing customer sentiment analysis. The methodology for the development of such a framework has encompassed six elements: distributed processing, preprocessing techniques, feature extraction from customer sentiment data, and using an efficient classification technique for customer sentiment. The advantages of using big data technology for sentiment analysis include that the proposed framework allows for parallel processing and therefore allows for increased levels of computing performance to be achieved in a timely manner for businesses looking for up-to-date customer sentiment analysis. The results of the research have shown that the proposed customer sentiment analysis framework can process a high volume of customer sentiment data while achieving a high degree of classification accuracy. The framework\'s design offers flexibility when dealing with different sources of customer sentiment data as well as with the constantly changing nature of customer behaviour. This computer model therefore provides a practical solution for performing a customer sentiment analysis, and it will allow businesses to make informed decisions about business intelligence and customer relationship management based on the extracted customer sentiment data. Future research avenues might be to extend the proposed approach by including deep learning language models, realtime data stream capabilities, and modifications to deal with multi-lingual and context-specific sentiment analysis. Such enhancements will benefit the robustness and requirements of dynamically modifying big data.

References

[1] B. Liu, Sentiment Analysis and Opinion Mining. San Rafael, CA, USA: Morgan & Claypool, 2012. [2] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Found. Trends Inf. Retr., vol. 2, no. 1–2, pp. 1– 135, 2008. [3] E. Cambria, “Affective computing and sentiment analysis,” IEEE Intell. Syst., vol. 31, no. 2, pp. 102–107, Mar./Apr. 2016. [4] T. Joachims, “Text categorisation with support vector machines: Learning with many relevant features,” in [5] Proc. Eur. Conf. Mach. Learn. (ECML), Chemnitz, Germany, 1998, pp. 137–142. [6] A. McCallum and K. Nigam, “A comparison of event models for Naïve Bayes text classification,” in Proc. AAAI Workshop Learn. Text Categorisation, Madison, WI, USA, 1998, pp. 41–48. [7] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008. [8] M. Zaharia et al., “Apache Spark: A unified engine for big data processing,” Commun. ACM, vol. 59, no. 11, pp. 56–65, Nov. 2016. [9] R. Feldman, “Techniques and applications for sentiment analysis,” Commun. ACM, vol. 56, no. 4, pp. 82– 89, Apr. 2013. [10] Y. Kim, “Convolutional neural networks for sentence classification,” in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), Doha, Qatar, 2014, pp. 1746–1751. [11] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in Proc. Int. Conf. Learn. Representations (ICLR), Scottsdale, AZ, USA, 2013. [12] J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), Doha, Qatar, 2014, pp. 1532–1543. [13] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [14] K. Ravi and V. Ravi, “A survey on opinion mining and sentiment analysis,” Knowl.-Based Syst., vol. 89, pp. 14– 46, Nov. 2015. [15] A. Pak and P. Paroubek, “Twitter as a corpus for sentiment analysis and opinion mining,” in Proc. Int. Conf. Lang. Resour. Eval. (LREC), Valletta, Malta, 2010. [16] N. K. Chintalapudi, G. Raghava Rao, and M. V. Raghunath, “Sentiment analysis of social media data using Hadoop,” Int. J. Comput. Appl., vol. 100, no. 13, pp. 15–19, Aug. 2014. [17] M. Gandomi and M. Haider, “Beyond the hype: Big data concepts, methods, and analytics,” Int. J. Inf. Manage., vol. 35, no. 2, pp. 137–144, Apr. 2015. [18] J. Manyika et al., Big Data: The Next Frontier for Innovation, Competition, and Productivity. New York, NY, USA: McKinsey Global Institute, 2011. [19] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge, U.K.: Cambridge Univ. Press, 2008. [20] S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python. Sebastopol, CA, USA: O’Reilly Media, 2009. [21] J. Devlin, M. Chang, K. Lee, and K. Toutanova, \"BERT: Pre-training of deep bidirectional transformers for language understanding,\" in Proc. Conf. North American Chapter Assoc. Comput. Linguistics (NAACL), Minneapolis, MN, USA, 2019, pp. 4171–4186. [22] A. Vaswani et al., \"Attention is all you need,\" in Proc. Advances Neural Inf. Process. Syst. (NeurIPS), Long Beach, CA, USA, 2017, pp. 5998–6008. [23] S. Poria, D. Hazarika, N. Majumder, and R. Mihalcea, \"Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research,\" IEEE Trans. Affective Comput., vol. 13, no. 1, pp. 93–108, Jan. 2022.

Copyright

Copyright © 2026 Abdul Wahid Ansari, Sahan Ahmad Khan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET83504

Publish Date : 2026-06-06

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here