In the digital era, understanding user behavior on platforms like YouTube can offer insights into demographic patterns such as age. This study explores how YouTube engagement metrics—such as number of comments, subscription duration, and content activity—can be used to predict a user’s age using linear regression. A dataset comprising various user engagement attributes was preprocessed and analyzed. The linear regression model demonstrated that variables like subscription length and profanity index significantly correlate with age, while others had limited predictive value. The study contributes to understanding behavioral patterns of users on social media and showcases the potential of regression-based modeling in social analytics.
Introduction
This study investigates how user engagement metrics on YouTube can predict user age, employing a linear regression model to analyze data from 3,464 users. The findings highlight that behavioral indicators such as membership duration, comment frequency, and upload activity are significant predictors of age. These insights are valuable for marketers, platform designers, and researchers aiming to tailor content and strategies to specific age demographics.
Key Findings:
Membership Duration emerged as the most significant predictor of user age, with a p-value < 0.00001.
Other Notable Predictors included the number of comments and the presence of profanity in user IDs.
Model Evaluation demonstrated a good fit, with residuals evenly distributed and no undue influence from outliers.
This research contributes to the growing field of digital behavioral analysis, offering a method to infer user demographics based on online activity.
Conclusion
This study aimed to explore how user behavior metrics can predict a user’s age on a digital platform, utilizing both statistical regression and machine learning techniques. The findings support the hypothesis that behavioral patterns—particularly membership duration—can be significant indicators of user age.
References
[1] Ghosh, S., Mahata, D., & Shah, R. R. (2021). Understanding digital age groups using social media behavior: A case study on Twitter. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
[2] Haleem, A., Javaid, M., & Vaishya, R. (2020). Analysing user demographics based on digital interactions on social media platforms. Journal of Content, Community & Communication, 11(3), 45–52.
[3] Pappas, I. O., Patelis, T. E., & Giannakos, M. (2019). Predicting user age from digital behavior: An empirical study on engagement metrics and age inference. Computers in Human Behavior, 93, 295–306.
[4] Rangel, F., Rosso, P., & Potthast, M. (2018). Overview of the PAN 2018 Author Profiling Task: Multimodal Gender Identification in Twitter. CEUR Workshop Proceedings, 2125, 1–13.
[5] Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., et al. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PLOS ONE, 8(9), e73791.
[6] Nguyen, D., Gravel, R., Trieschnigg, D., & Meder, T. (2013). “How Old Do You Think I Am?” A study of language and age in Twitter. Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM).
[7] Burger, J., Henderson, J., Kim, G., & Zarrella, G. (2011). Discriminating gender on Twitter. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 1301–1309.