Sentiment analysis, a core task in Natural Language Processing (NLP), relies heavily on effective text representation techniques to capture semantic and syntactic nuances. This study presents a comparative analysis of widely-used vectorization methods—Bag of Words (BoW), Term Frequency–Inverse Document Frequency (TF-IDF), Word2Vec, GloVe, BERT, and RoBERTa—in the context of sentiment classification. Using the IMDb movie reviews dataset, each method is evaluated based on classification performance, using accuracy and F1-score as primary metrics. Results demonstrate that while deep contextual embeddings such as BERT and RoBERTa achieve the highest accuracy—RoBERTa in particular offering enhanced contextual sensitivity—simpler representations like TF-IDF provide competitive results with significantly lower computational overhead. The findings highlight the trade-offs between accuracy and efficiency, offering practical guidance for embedding selection in sentiment analysis applications.
Introduction
Text vectorization is the process of converting raw text into numerical form for machine learning models. It is a critical step in NLP tasks such as sentiment analysis, enabling algorithms to interpret semantic and syntactic structures. Vectorization methods are broadly divided into:
Static Embeddings (context-independent)
Contextual Embeddings (context-sensitive)
2. Evolution of Embedding Techniques
Static Embeddings:
Assign fixed vectors to words, regardless of context.
Examples:
BoW: Simple word counts.
TF-IDF: Frequency-based with inverse document weighting.
Word2Vec: Predictive model capturing semantic similarity.
GloVe: Based on global word co-occurrence statistics.
Limitations: Can’t handle polysemy or dynamic context.
Contextual Embeddings:
Generate dynamic word vectors based on surrounding text.
Built using transformer-based models:
BERT: Bidirectional attention.
RoBERTa: Enhanced BERT.
DistilBERT: Compressed BERT, faster and lighter.
ELECTRA: Efficient pretraining using replaced token detection.
XLNet: Permutation-based context modeling.
ALBERT: Lightweight BERT with parameter sharing.
Advantages: Handle word ambiguity and improve performance in complex NLP tasks.
3. Comparative Evaluation Setup
Dataset: IMDb Movie Reviews (50,000 reviews; binary sentiment classification)
Preprocessing: Lowercasing, tokenization, truncation to 256 tokens
Models:
Static embeddings: Combined with Logistic Regression
Contextual models outperform static ones in both accuracy and F1-score.
RoBERTa and ELECTRA lead in performance (93% accuracy).
DistilBERT and ALBERT offer a good trade-off between speed, memory, and accuracy (~91%), making them suitable for real-time applications.
Static methods, while easier and faster to implement, are limited in contextual understanding, which affects performance in nuanced tasks like sentiment analysis.
Conclusion
This study conducted a comprehensive comparative analysis of static and contextual text embedding techniques for sentiment analysis, using the IMDb movie reviews dataset. The results demonstrate that contextual embeddings, particularly RoBERTa and ELECTRA, offer significantly superior performance over traditional methods, achieving the highest accuracy and F1-scores.While static embeddings such as BoW, TF-IDF, Word2Vec, and GloVe are computationally efficient and easy to implement, they fall short in capturing contextual semantics, which are crucial for understanding sentiment. Among static methods, GloVe achieved the best balance between performance and speed.On the other hand, contextual models such as BERT, RoBERTa, and DistilBERT deliver highly accurate results by incorporating context-awareness through transformer-based architectures. Notably, DistilBERT and ALBERT offer a favorable trade-off between performance and efficiency, making them suitable for real-time or resource-limited applications. Overall, the choice of embedding technique should align with the application\'s requirements—whether prioritizing accuracy, speed, or resource constraints.
References
[1] Jiaxin Lu,” Text vectorization in sentiment analysis: A comparative study of TF-IDF and Word2Vec from Amazon Fine Food Reviews”( ITM Web of Conferences 2025)
[2] Zerui Zhan ,” Comparative Analysis of TF-IDF and Word2Vec in Sentiment Analysis: A Case of Food Reviews” (ITM Web of Conferences 2025)
[3] Gagandeep Singh,,Dr. Surender Kumar, Dr. Sukhdev Singh,” Contextual Word Embeddings: A Review”, Tuijin Jishu/Journal of Propulsion Technology,2024
[4] Sumona Deb , Ashis Kumar Chanda ,” Comparative analysis of contextual and context-free embeddings in disaster prediction from Twitter data”, Machine Learning with Applications,2022
[5] S. Singh, K. Kumar, and B. Kumar, “Sentiment Analysis of Twitter Data Using TFIDF and Machine Learning Techniques,” 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), May 2022
[6] H. Liu, X. Chen, and X. Liu. A study of the application of weight distributing method combining sentiment dictionary and TF-IDF for text sentiment analysis. IEEE Access, 10, pp. 32280-32289. [Accessed 10 Aug. 2024] (2022).
[7] Z. Lan et al., “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations,” in Proc. of ICLR, 2020.
[8] K. Clark, M. Luong, Q. Le, and C. D. Manning, “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators,” in Proc. of ICLR, 2020.
[9] https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews