Fake Review Detection Using Machine Learning Algorithms

Authors: Prof. M. M. Baig, Prof. Rohan Kokate, Prutha Gomkale

DOI Link: https://doi.org/10.22214/ijraset.2026.80878

Abstract

Online product reviews have become a foundational component of consumer decision-making across global e-commerce platforms such as Amazon, Flipkart, Zomato, and Google Maps. The rapid proliferation of bot-generated, incentivized, and adversarially crafted fake reviews has critically eroded the trustworthiness of this feedback ecosystem. Existing binary classification approaches are inadequate to capture the nuanced and continuous spectrum of review authenticity. This paper presents Veracity AI, a forensic linguistic web application that evaluates review authenticity through a multi-layered hybrid architecture combining a custom-built Multinomial Naive Bayes Machine Learning classifier with a Deep Natural Language Processing pipeline powered by the Google Gemini AI API. The proposed system generates a continuous Trust Spectrum score (0–100), an Origin Classifier label (e.g., Bot-Generated, Paid Human Writer, Competitor Attack, Genuine), and a sentence-level autopsy visualization. A Fractional Dataset Weighting mechanism (40% AI weight / 60% user signal) provides robust defense against adversarial data poisoning attacks. The continuous learning feedback loop stores validated user verdicts in Firebase Firestore, dynamically retraining the local Naive Bayes model in real time. Experimental evaluation demonstrates that the Naive Bayes classifier achieves agreement with the NLP ground truth in 83% of test cases, with the system correctly identifying bot-generated, paid-writer, and genuine review categories across diverse test inputs. The system is deployed as a full-stack React/TypeScript web application, offering an intuitive Bento Grid dashboard with Radar chart visualizations for forensic linguistic fingerprinting.

Introduction

The text presents Veracity AI, a forensic system designed to detect fake online reviews in response to the growing problem of review manipulation on e-commerce platforms. Online reviews strongly influence consumer decisions, but they are increasingly being faked using bots, incentivized writers, and AI-generated content, making traditional detection methods insufficient.

The proposed system combines a custom Multinomial Naive Bayes classifier with a deep NLP analysis layer (Google Gemini AI) to evaluate reviews more intelligently than standard binary classifiers. Instead of simply labeling reviews as fake or real, it generates a continuous Trust Spectrum score, identifies the likely origin of the review (e.g., bot-generated, paid writer, competitor attack), and provides sentence-level forensic explanations along with visual insights such as linguistic fingerprint charts. It also includes a real-time feedback loop using Firebase, allowing the model to continuously learn from user corrections while using a weighting mechanism to reduce the risk of data poisoning.

The literature review shows that fake review detection has evolved from basic machine learning methods (SVM, Naive Bayes) to advanced deep learning and transformer-based models (like BERT). Earlier research also highlights behavioral analysis and network patterns as useful signals. However, key gaps remain: most systems are static, lack interpretability, rely on binary classification, are vulnerable to adversarial feedback, and are rarely deployed as real-world applications.

To address these issues, Veracity AI introduces a hybrid architecture consisting of four layers: user input interface, ML classification layer, NLP analysis layer, and a feedback/retraining system. The system processes reviews through both local classification and cloud-based NLP analysis, then displays results on an interactive dashboard.

The dataset used is a hybrid of a small seed dataset and a continuously growing crowdsourced dataset collected via Firebase feedback. This enables ongoing model improvement while maintaining balance and adaptability.

Conclusion

This paper presented Veracity AI, a forensic linguistic web application for fake review detection that addresses the critical limitations of existing binary, static, and non-interpretable classification systems. The proposed hybrid architecture successfully integrates a custom Multinomial Naive Bayes classifier, implemented from scratch in TypeScript with Laplace Smoothing, with a Deep Natural Language Processing pipeline leveraging the Google Gemini AI API to deliver a multi-dimensional forensic analysis of review authenticity. The system\'s continuous learning feedback loop, backed by Firebase Firestore and protected by a fractional dataset weighting mechanism, enables dynamic model improvement in production while maintaining principled defense against adversarial data poisoning attacks. Experimental evaluation demonstrated that the Naive Bayes classifier achieves 83.3% accuracy, 82.3% precision, 85.0% recall, and an F1-score of 83.6% against NLP-derived ground truth, with inter-method agreement across 83% of test cases. The forensic interpretability features—sentence-level autopsy with color-coded suspicion classification, linguistic DNA fingerprint Radar charts, emotion-vs-rating mismatch detection, and vagueness indexing—constitute a significant advancement over existing black-box classification systems, enabling end-users to understand the forensic basis of each authenticity assessment. The continuous Trust Spectrum (0–100) and origin classification label set provide nuanced authenticity assessments that binary classifiers fundamentally cannot deliver. The deployment of a production-quality, serverless React/TypeScript web application with a Bento Grid dashboard demonstrates that ML-driven fake review detection can be made both highly interpretable and practically deployable without requiring specialist infrastructure beyond standard cloud services. The growing menace of fake online reviews represents a systemic threat to consumer trust and market fairness; this research demonstrates that a practical, adaptive, and forensically transparent detection system is achievable within a modern web application architecture.

References

[1] M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, \"Finding Deceptive Opinion Spam by Any Stretch of the Imagination,\" in Proc. 49th Annual Meeting of the Association for Computational Linguistics (ACL), Portland, OR, USA, 2011, pp. 309–319. [2] N. Jindal and B. Liu, \"Opinion Spam and Analysis,\" in Proc. First International Conference on Web Search and Web Data Mining (WSDM), Palo Alto, CA, USA, Feb. 2008, pp. 219–230. [3] A. Mukherjee, B. Liu, and N. Glance, \"Spotting Fake Reviewer Groups in Consumer Reviews,\" in Proc. 21st International World Wide Web Conference (WWW), Lyon, France, Apr. 2012, pp. 191–200. [4] G. Yao and B. Shi, \"Catching Fake Reviews Using Generative Adversarial Networks,\" Neurocomputing, vol. 467, pp. 291–298, Dec. 2021. [5] S. Kumar and A. Srinivasan, \"Aspect-Level Fake Review Detection Using BERT and Deep Learning,\" Journal of Intelligent Information Systems, vol. 58, no. 3, pp. 415–437, Jun. 2022. [6] F. Sebastiani, \"Machine Learning in Automated Text Categorization,\" ACM Computing Surveys, vol. 34, no. 1, pp. 1–47, Mar. 2002. [7] T. Joachims, \"Text Categorization with Support Vector Machines: Learning with Many Relevant Features,\" in Proc. European Conference on Machine Learning (ECML), Chemnitz, Germany, Apr. 1998, pp. 137–142. [8] A. McCallum and K. Nigam, \"A Comparison of Event Models for Naive Bayes Text Classification,\" in Proc. AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, USA, 1998, pp. 41–48. [9] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, \"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,\" in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), Minneapolis, MN, USA, Jun. 2019, pp. 4171–4186. [10] V. Jindal, P. Kashyap, and S. Bhatt, \"Fake Review Detection Using Sentiment Analysis,\" International Journal of Computer Applications, vol. 180, no. 30, pp. 12–18, 2018. [11] Google LLC, \"Google Gemini AI API Documentation,\" [Online]. Available: https://ai.google.dev/docs. [Accessed: April 2026]. [12] Google LLC, \"Firebase Documentation — Firestore,\" [Online]. Available: https://firebase.google.com/docs/firestore. [Accessed: April 2026]. [13] Vitejs, \"Vite — Next Generation Frontend Tooling,\" [Online]. Available: https://vitejs.dev. [Accessed: April 2026]. [14] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press, 2008

Copyright

Copyright © 2026 Prof. M. M. Baig, Prof. Rohan Kokate, Prutha Gomkale. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET80878

Publish Date : 2026-04-23

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here