This project develops a multilingual transformer model for sentiment and emotion detection across Dravidian languages. It uses a shared encoder trained on over 40,000 multilingual text samples. Advanced preprocessing handles code-mixed and low-resource texts effectively. The model jointly learns sentiment polarity and emotion intensity using transformer-based features. It achieves higher accuracy and robustness than traditional machine learning methods.
Introduction
The project focuses on developing a Multilingual Sentiment and Emotion Detection System leveraging NLP and transformer-based models to process multiple Indian languages and code-mixed texts. Traditional sentiment analysis models are often English-centric and struggle with low-resource languages, motivating the need for inclusive multilingual solutions. The system uses IndicBERT and MuRIL models, fine-tuned on over 40,000 labeled samples, to perform dual-task classification of sentiment (positive, negative, neutral) and emotions (joy, anger, sadness, fear, love, surprise, etc.).
Key Components & Methodology:
Data Preprocessing: Standardizes multilingual and code-mixed text using normalization, tokenization, noise removal, and the Indic NLP Library, enhancing semantic understanding.
Model Fine-tuning: Transformer models are adapted via multi-task learning, optimized with dropout, batch normalization, and learning rate scheduling for robust performance.
Classification Engine: Generates sentiment polarity and emotion predictions, supporting cross-lingual inputs with high accuracy and F1-scores.
Web-based Interface: Built with HTML, CSS, JavaScript, and FastAPI, it provides real-time results with text and emoji indicators, allowing easy user interaction.
Handles Dravidian languages (Tamil, Telugu, Kannada, Malayalam) with code-mixing.
Provides real-time sentiment and emotion analysis through a user-friendly web interface.
Scalable, reliable, and portable, suitable for social media monitoring, customer feedback analysis, and opinion mining.
Evaluation:
The system improves over previous approaches by addressing multilingual challenges, enhancing neutral sentiment detection, handling emojis, and supporting low-resource languages while maintaining high accuracy, speed, and usability.
Conclusion
This project presents a robust AI-driven Multilingual Sentiment and Emotion Detection System designed specifically for Dravidian languages such as Tamil, Telugu, Kannada, and Malayalam. Leveraging fine-tuned transformer-based models like IndicBERT and MuRIL, the system effectively interprets linguistic nuances, emotional tones, and context-rich expressions commonly found in regional and code-mixed digital communication. The model demonstrates exceptional capability in identifying both sentiment polarity (Positive, Negative, Neutral) and complex emotions (Joy, Anger, Sadness, Fear, Love, Surprise, etc.), even in informal or code-mixed social media text.
The system architecture integrates a FastAPI backend for efficient inference handling and a responsive web-based frontend for real-time interaction and visualization. This design ensures scalability, modularity, and user accessibility, making the system suitable for deployment in large-scale environments such as social media analytics platforms, feedback monitoring systems, and customer sentiment evaluation dashboards. Furthermore, the integration of optimized preprocessing pipelines — including text normalization, tokenization, and code-mixing resolution — has enabled the model to achieve higher accuracy, precision, recall, and F1-scores compared to conventional machine learning techniques such as SVM, Naïve Bayes, and Logistic Regression.
This work highlights the transformative potential of transformer-based architectures for low-resource Indian languages, bridging the technological gap in regional language processing and sentiment understanding. Beyond research significance, the system offers tangible societal benefits — from monitoring public sentiment during crisis events to enhancing personalized recommendation systems and strengthening human-computer emotional interaction in multilingual contexts.
References
[1] Sultan Saaed Almalki (2025). \"Sentiment Analysis and Emotion Detection Using Transformer Models in Multilingual Social Media Data.\" International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 16, No. 3, pp. 324-333.
[2] Charangan Vasantharajan, Sean Benhur, Prasanna Kumar Kumarasen, Rahul Ponnusamy, Sathiyaraj Thangasamy, Ruba Priyadharshini, Thenmozhi Durairaj, Kanchana Sivanraju, Anbukkarasi Sampath, Bharathi Raja Chakravarthi, John Phillip McCrae. \"TamilEmo: Finegrained Emotion Detection Dataset for Tamil.\"
[3] Meeradevi, Sowmya B. J., Swetha B. N. (2024). \"Evaluating the machine learning models based on natural language processing tasks.\" IAES International Journal of Artificial Intelligence (IJ-AI), Issue June 2024, Vol. 13, No. 2, pp. 1954-1968.
[4] Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, Partha Talukdar (2021). \"MuRIL: Multilingual Representations for Indian Languages.\" arXiv Preprint, Issue March 2021, arXiv:2103.10730v2.
[5] Kogilavani Shanmugavadivel, V. E. Sathishkumar, Sandhiya Raja, T. Bheema Lingaiah, S. Neelakandan, Malliga Subramanian (2022). \"Deep learning based sentiment analysis and offensive language identification on multilingual code-mixed data.\" Scientific Reports (Nature), Issue 2022, Vol. 12, Article 21557.