The increasing demand for music recommendation systems that correlate with users’ emotions has been augmented by the popularity of customized digital experiences. In this work, a more specific machine learning application will be per- formed, Imotion driven music recommendation system through recognition of emotions. Integrating facial emotions, voice tone, and overall mood, the system builds custom playlists in real time. Overall, emotion mapping is performed using advanced techniques such as convolutional neural networks (CNN) with natural language processing (NLP) for multimedia integration. The research also sheds light on the result’s attempts to balance between user’s music and emotional preferences in face of challenges like diversity of datasets, generalization of emotions, and scalability of the system as such. The findings provide evidence of the success of ML-based frameworks in improving user satisfaction and interaction, making the applications of this technology relevant within the constantly expanding area of emotion-centric applications.
Introduction
The increasing emotional connection between users and music has driven a demand for highly personalized music experiences. By combining emotion recognition technologies (e.g., facial analysis, speech, text interpretation) with Machine Learning (ML) — particularly Convolutional Neural Networks (CNNs) and Natural Language Processing (NLP) — it’s now possible to tailor music to users' real-time emotional states. This thesis presents a system that uses facial, audio, and text-based emotion detection to recommend music, aiming to reshape user interaction with digital music platforms.
Literature Review
Studies show promising methods in emotion-aware music recommendation:
Facial Emotion Detection (CNNs) can accurately recommend music but vary across demographics.
Emotion-tagged Datasets with ML models like SVM and Random Forests enable genre-emotion linking.
Audio Features (tempo, rhythm, timbre) aid mood detection but raise scalability issues.
Multimodal Systems (facial, audio, sentiment via LSTM and CNNs) offer improved accuracy but suffer from processing inefficiency.
Common Challenges Identified:
Lack of diverse, annotated datasets
Difficulty in understanding subjective and contextual emotions
Real-time processing complexity
Privacy and ethical concerns over personal data usage
Lack of adaptability to users’ evolving emotional patterns
Gap Analysis
Major research gaps include:
Overreliance on single-mode emotion detection (e.g., only facial or audio).
Limited availability of diverse and labeled datasets.
Difficulty handling subjective and context-dependent emotional expressions.
High computational demands, especially for real-time or mobile applications.
Lack of long-term personalization and privacy safeguards.
Methodology
The system integrates emotion detection and music recommendation using ML:
A. System Architecture
Emotion Detection:
Uses CNN-based pre-trained models to classify emotions (happy, sad, angry, surprise) from facial images or video.
Music Recommendation:
Integrates with the Spotify API to suggest tracks based on detected emotion.
User Interface:
Built with Streamlit, allowing real-time image/video uploads and music suggestions.
B. Data Collection & Preprocessing
Uses .npy files with labeled facial expression features.
Facial landmarks are extracted, normalized, and prepared for training.
C. Modeling & Training
CNN models (e.g., VGG16, VGG19, ResNet) are fine-tuned for emotion classification.
Model performance is validated using accuracy, precision, recall, and F1 score.
D. Music Recommendation Logic
Emotion-to-genre mapping (e.g., pop for joy, classical for sadness).
Spotipy is used to interface with Spotify’s database.
Results
A. Emotion Detection
Achieved 85% overall accuracy
Best performance: Happy and Neutral (>90%)
Lower accuracy for Anger and Surprise (~75–80%) due to subtle facial cues
Real-world accuracy dropped slightly; future improvements include using larger datasets and data augmentation
B. Music Recommendation
Recommendations aligned well with emotional states; user feedback was positive
Therapeutic value noted; music helped enhance emotional experiences
Streamlit UI provided a smooth, responsive experience
Some users wanted more genre variety
Conclusion
This research presents the development and evaluation of an emotion-based music recommendation system, utilizing facial expression recognition to assess users’ emotional states and deliver personalized music suggestions. The system’s ability to detect a range of emotions—including happiness, sadness, anger, and surprise—was validated through real-time image processing, yielding an accuracy rate of 85%. Music recommendations were then tailored to these emotions, enhancing user experiences through contextually relevant playlists, thus demonstrating the potential for emotional engagement in interactive media.
References
[1] Ekman, P. (1999). Basic Emotions. In T. Dalgleish & M. J. Power (Eds.), Handbook of Cognition and Emotion (pp. 45-60). Wiley.
[2] Viola, P., & Jones, M. J. (2001). Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer So- ciety Conference on Computer Vision and Pattern Recognition (CVPR), 511-518.
[3] Kumar, R., & Jain, P. (2021). Emotion detection using machine learning and deep learning. International Journal of Emerging Technology and Advanced Engineering, 11(4), 34-42.
[4] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.
[5] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
[6] Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, “Electron spectroscopy studies on magneto-optical media and plastic substrate interface,” IEEE Transl. J. Magn. Japan, vol. 2, pp. 740–741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982].
[7] M. Young, The Technical Writer’s Handbook. Mill Valley, CA: Univer- sity Science, 1989.
[8] Chiu, B., & Cohn, J. F. (2005). Emotion detection using facial expression recognition. Journal of Machine Learning Research, 6, 101-110.
[9] Jain, R., Paul, S., & Kaur, G. (2019). Emotion-based music recommen- dation using machine learning algorithms. International Conference on Machine Learning and Data Science, 87-95.
[10] Kwon, S., & Kim, M. (2020). Music emotion recognition with deep learning techniques: A review. IEEE International Conference on AI and Big Data, 246-253.
[11] Liu, L., & Zhang, X. (2017). Emotion-driven music recommendation system using deep convolutional neural networks. IEEE Transactions on Multimedia, 19(7), 1461-1470.
[12] Chen, Z., & Li, X. (2018). Facial expression recognition and its applica- tion in music recommendation systems. Computational Intelligence and Neuroscience, 2018, 1-8.
[13] Zhang, W., & Liu, Y. (2015). Emotion recognition from music signals using machine learning. IEEE International Conference on Data Mining, 303-311.
[14] Singh, S., & Soni, R. (2021). AI-based emotion recognition and music recommendation system for personalized playlists. Journal of Artificial Intelligence Research, 60, 57-65.
[15] Miller, M., & Green, T. (2020). Enhancing personalized music recom- mendations through emotion analysis. Journal of Personalized Music Technology, 22, 88-95.
[16] Huang, S., Chen, X., & Lee, W. (2019). A review of emotion recognition in music recommendation systems. Journal of Computational Intelli- gence in Music, 5(2), 202-215.
[17] Swati Nikam, Santosh Chobe, Simardeep Singh, Tejas Dixit, Aditya Dhayagude, Himanshu Raheja, ”Music Player System Using Real- Time Facial Expression Detection”, 2024 8th International Conference on Computing, Communication, Control and Automation (ICCUBEA), pp.1-6, 2024.
[18] P. Ashwini, Pranav Dammalapati, Nagajaswanth Ramineni, T Adilak- shmi, ”Facial Expression based Music Recommendation System using Deep Convolutional Neural Network”, 2024 International Conference on Expert Clouds and Applications (ICOECA), pp.992-999, 2024.
[19] Uuhasri Madala, Soumya Puvvada, Krishna Praneetha Lingamarla, Jaya Sri Annam, Sourav Mondal, Debnarayan Khatua, ”A Hybrid Model for Music Recommendation Based on Facial Emotion Recognition”, 2024 8th International Conference on Inventive Systems and Control (ICISC), pp.138-144, 2024.