AI Based Popularity Prediction of TV Shows & Movies

Authors: Dr. Sajja Suneel, Mukkera Sarayu , Behnaz Mohammad , Pranav Anand

DOI Link: https://doi.org/10.22214/ijraset.2026.78546

Abstract

The exponential growth of digital streaming platforms has necessitated the development of precise predictive models to assess audience engagement and the anticipated popularity of television series and films prior to their release. This study introduces an AI-powered predictive system that amalgamates machine learning, natural language processing, sentiment analysis, and social media analytics to anticipate popularity metrics. The research utilises diverse data sources, including IMDB ratings, critical reviews, Twitter engagement, and box office performance. The proposed model employs feature engineering techniques such as sentiment scoring, topic modelling, and audience engagement features. Then, it uses supervised learning algorithms like Random Forest, Linear Regression, and Gradient Boosting. Experimental evaluation using accuracy, precision, recall, and F1-score demonstrates the effectiveness of the proposed framework. This paper highlights a scalable framework that can be deployed as a cloud-based API for production environments.

Introduction

The study highlights the evolution from early statistical models (like regression and decision trees) to more advanced techniques incorporating social media signals and sentiment analysis. Natural Language Processing (NLP), especially with advanced models like transformers, plays a key role in understanding audience opinions and improving prediction accuracy.

The proposed system follows a layered architecture, including:

Data ingestion from platforms like IMDb, Twitter, and YouTube,
Preprocessing and cleaning of textual and numerical data,
Feature engineering combining sentiment, engagement, and metadata,
A prediction engine using machine learning models such as Random Forest and Gradient Boosting.

The methodology involves structured steps like data preprocessing, feature extraction, model training, and evaluation using metrics such as accuracy, precision, and recall. Results show that ensemble models outperform simpler models, with Random Forest achieving the best performance (around 86% accuracy).

Overall, the study demonstrates that combining sentiment analysis, social media engagement, and metadata provides more accurate and reliable popularity predictions than traditional methods, enabling better decision-making in the digital entertainment industry.

Conclusion

This paper presents a comprehensive AI-based framework for forecasting the popularity of television programs and films through the integration of machine learning, sentiment analysis, and social media engagement metrics. The proposed system uses data from reviews, ratings, online discussions, and metadata to get both qualitative audience reactions and quantitative performance indicators. This enables more accurate and reliable predictions. Experimental evaluation indicates that ensemble models like Random Forest and Gradient Boosting consistently outperform traditional linear classifiers. This proves that they are good for modelling complex, non-linear audience behaviour. The system architecture created in this study ensures that it is modular, scalable, and easy to add to existing analytics pipelines. Each part works on its own, from taking in data to making predictions, but they all work together to make a complete workflow that can easily be expanded to include new data streams or more advanced analytical modules. These features make the framework useful in the real world for streaming services, production companies, and marketing teams that want to make smart choices about how to spend money on content and how to promote it. The results are promising, but the system has some flaws, especially when it comes to dealing with outside events or sudden changes, like controversies, unexpected publicity spikes, or audience dynamics that are specific to a certain area. Adding visual and audio-based analysis of trailers, more complex temporal features, or transformer-based architectures could make predictions even better. Also, adding real-time data streams might help the model keep up with changing audience tastes. This study lays a solid groundwork for automated popularity forecasting and showcases the promise of AI- driven methodologies in the realm of entertainment analytics. The knowledge gained from this work lays the groundwork for more sophisticated, multimodal prediction systems that can aid strategic decision-making throughout the entertainment lifecycle. In conclusion, the suggested framework indicates that using different data sources and smart learning methods together can greatly improve the accuracy of models that predict popularity. As audience behavior continues to evolve rapidly in the digital media ecosystem, AI-driven systems will become more and more important for making content decisions and improving media strategies. The proposed system is designed not only for prediction but also as a decision-support tool for producers, distributors, and streaming platforms.

References

[1] J. Liu, M. He, and K. Choi, “Predicting movie popularity using machine learning techniques,” IEEE Access, vol. 8, pp. 129–138, 2020. [2] S. Ghosh and A. Roy, “Sentiment-aware movie success prediction from social media data,” International Journal of Data Science and Analytics, vol. 7, no. 4, pp. 299–310, 2021. [3] A. M. Elragal and M. Othman, “Big data analytics for predicting movie box office success,” Procedia Computer Science, vol. 159, pp. 253–260, 2019. [4] R. Pang, K. Lee, and W. Li, “Social media signals and their impact on entertainment popularity prediction,” IEEE Transactions on Computational Social Systems, vol. 5, no. 2, pp. 456–466, 2022. [5] B. Liu, Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers, 2012. [6] A. Pak and P. Paroubek, “Twitter as a corpus for sentiment analysis and opinion mining,” in Proc. 7th Int. Conf. on Language Resources and Evaluation (LREC), 2010, pp. 1320–1326. [7] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [8] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL-HLT, 2019, pp. 4171–4186. [9] Y. Liu et al., “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, 2019. [10] A. V. Phan and A. P. Nguyen, A machine learning approach for predicting movie success using social media sentiment and metadata features,IEEE International Conference on Big Data (BigData), pp. 3204–3211, 2019.

Copyright

Copyright © 2026 Dr. Sajja Suneel, Mukkera Sarayu , Behnaz Mohammad , Pranav Anand . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78546

Publish Date : 2026-03-20

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here