With the proliferation of social media, there is a growing opportunity to leverage user-generated content for early mental health screening. A significant portion of online discourse reflects users\' mental states, but manual analysis is infeasible at scale. This study addresses the challenge of accurately identifying signs of depression from social media text using automated methods. To tackle this problem, we evaluated the effectiveness of a baseline TF-IDF with Logistic Regression model and a Convolutional Neural Network (CNN) on a large dataset of Reddit posts. Both models demonstrated high efficacy, achieving accuracy and F1-scores of approximately 93%. The models yielded excellent ROC AUC scores (0.9825 for Logistic Regression and 0.9782 for the CNN), indicating a strong ability to distinguish between depressed and non-depressed users. A detailed error analysis revealed that the CNN produced fewer false negatives, a critical consideration for clinical applications. This work establishes a strong baseline for using machine learning for depression detection and highlights the importance of model selection based on specific error-reduction goals.
Introduction
The study focuses on automated depression detection from Reddit posts using natural language processing (NLP) and machine learning. It addresses the limitations of traditional diagnostic methods, such as stigma and underreporting, and highlights the need for scalable AI systems to identify mental health risks in user-generated content.
Two models were developed and compared: a TF-IDF + Logistic Regression baseline model and a Convolutional Neural Network (CNN) using word embeddings. The dataset consists of over 200,000 Reddit posts from “r/depression” and “r/teenagers,” preprocessed through standard NLP techniques like cleaning, stopword removal, and normalization.
The methodology includes feature extraction (TF-IDF for Logistic Regression and word embeddings for CNN), binary classification, and evaluation using precision, recall, F1-score, confusion matrix, and ROC-AUC.
Results show both models perform strongly, achieving about 93–94% accuracy. The Logistic Regression model offers slightly better precision and interpretability, while the CNN achieves higher recall, meaning it is better at identifying actual depression cases (fewer false negatives), which is important for mental health screening.
Overall, the study concludes that both traditional and deep learning approaches are effective, but CNN is more sensitive to depression detection, while Logistic Regression is more interpretable. It suggests that hybrid models may provide even better performance in future systems.
Conclusion
In this paper, we focused on the task of depression detection from social media text. We proposed and evaluated two machine learning pipelines, combining feature extraction techniques with a Logistic Regression and a CNN classifier. Experimental results demonstrate that both methods successfully learn the linguistic patterns necessary for accurate classification, achieving high performance with F1scores around 93%. While both models are effective, the CNN\'s superior ability to minimize false negatives makes it a more reliable choice for practical deployment. With expanded, more diverse datasets and the integration of explainability techniques, it is reasonable to expect that alternative methods could build upon these findings to create even safer and more effective tools for mental health monitoring.
References
In this paper, we focused on the task of depression detection from social media text. We proposed and evaluated two machine learning pipelines, combining feature extraction techniques with a Logistic Regression and a CNN classifier. Experimental results demonstrate that both methods successfully learn the linguistic patterns necessary for accurate classification, achieving high performance with F1scores around 93%. While both models are effective, the CNN\'s superior ability to minimize false negatives makes it a more reliable choice for practical deployment. With expanded, more diverse datasets and the integration of explainability techniques, it is reasonable to expect that alternative methods could build upon these findings to create even safer and more effective tools for mental health monitoring.