Human emotions greatly influence decision-making processes, productivity, and even mental well-being. As technology advances and artificial intelligence emerges, the need for a machine that can identify human emotions and respond to them becomes evident. In this paper, the development of a Mood-based Recommendation System that uses deep learning techniques is proposed. It will be able to identify users\' facial emotions and suggest corresponding wellness activities. To do so, the system will use Residual Network and the attention mechanism to ensure precise emotion recognition. In case the user types anything, the system will utilize BERT model to classify user\'s intent. Based on user\'s emo- tions and intent, wellness activities such as music, video recommenda- tions, as well as quotes and activities, will be suggested via API integra- tion. Experiments demonstrated satisfactory performance of the system both in terms of real-time emotion recognition and personalized sugges- tions. In a nutshell, the system integrates computer vision, deep learning, NLP, and multimedia services into one emotional wellness application.
Introduction
The text describes a Mood-Based Recommendation System that combines facial emotion recognition and text intent analysis to provide personalized recommendations such as songs, videos, quotes, and yoga exercises.
Emotions significantly affect human behavior, decision-making, productivity, and communication. While traditional recommendation systems rely on user preferences, browsing history, and ratings, they often ignore a user's current emotional state. To address this limitation, the proposed system detects emotions from facial expressions and understands user intentions from text input.
The system uses a ResNet neural network with an attention mechanism to identify emotions such as happiness, sadness, anger, fear, surprise, disgust, and neutrality. The attention layer focuses on important facial features like the eyes, eyebrows, and mouth. For text understanding, it employs BERT to classify user intent and associate it with emotional needs such as relaxation, motivation, concentration, or comfort.
The methodology consists of:
Image Acquisition and Preprocessing – Capturing webcam images, detecting faces, resizing, and normalizing them.
Emotion Detection – Using ResNet with attention to recognize emotions in real time.
Intent Classification – Applying BERT to analyze user text and understand intentions.
Recommendation Engine – Generating personalized recommendations based on detected emotions and intent.
The study builds on previous research showing that CNN-based approaches outperform traditional methods such as LBP, HOG, and facial landmark detection. Unlike earlier works that focused mainly on classification accuracy, this system integrates emotion recognition with recommendation functionality.
Experimental results demonstrate:
High training and validation accuracy with good generalization.
Decreasing training and validation loss, indicating effective learning and minimal overfitting.
Strong recognition of emotions like happiness and surprise, with slightly lower performance for fear and disgust.
Effective real-time emotion detection and stabilization.
Successful generation of personalized recommendations for music, videos, inspirational quotes, and yoga mudras using external APIs such as Spotify and YouTube.
Overall, the proposed system enhances personalization by combining real-time emotion detection, intent understanding, and content recommendation to better match users' emotional and contextual needs.
Conclusion
The core of the mood detection recommendation system is based on recognizing moods through deep learning in combination with facial emotions recognition in real-time. It is implemented using a neural network built with a ResNet architecture, equipped with an attention module. Additionally, it uses the BERT approach for contextualization of emotions detected by the algorithm. Based on the mood, the system will suggest music, videos, motivational quotes, and wellness activities.
The results obtained during its testing have confirmed the high accuracy and reliability
of the algorithm in operation in real-time, and also prove how successfully computer vision and deep learning combined with the API recommendation approach can work together.
References
[1] Ranjani, R. M., Nalla, R., M., S., & K, S. G. (2025). \"Emotion Recognition using CNN.\" Proc. 8th Int. Conf. Trends in Electronics and Informatics (ICOEI).
[2] Agrawal, I., V, Y., Kumar, A., Hegde, R., & G, S. D. (2021). \"Emotion Recog- nition from Facial Expression using CNN.\" Proc. IEEE Region 10 Humanitarian Tech- nology Conf.
[3] Shukla, D., Kumari, R., & Bhargavi, A. (2024). \"Human Face Detection and Emotion Recognition Using OpenCV through AI.\" Proc. 12th Int. Conf. Internet of Every-thing, Microwave, Embedded, Communication and Networks (IEMECON).
[4] Beena Priya, R. N., Hanmandlu, M., & Vasikarla, S. (2021). \"Emotion Recog- nition Us-ing Deep Learning.\" Proc. IEEE Applied Imagery Pattern Recognition Work- shop (AIPR).
[5] He, K., Zhang, X., Ren, S., & Sun, J. (2016). \"Deep Residual Learning for Image Recognition.\" Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR).
[6] Vaswani, A., et al. (2017). \"Attention Is All You Need.\" Advances in Neural Information Processing Systems (NeurIPS).
[7] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). \"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.\" Proc. NAACL- HLT.
[8] P-L. Carrier and A. Courville, “FER-2013 Facial Expression Recognition Da- taset”, Kaggle, 2013.
[9] Spotify for Developers, “Spotify Web API Documentation” 2024. Available at: https://developer.spotify.com/
[10] Google Developers, “YouTube Data API v3,” 2024. [Online]. Available: https://developers.google.com/youtube