Student feedback is highly valued, but remains one of the most under-utilized assets within education systems today. Traditionally, analyzing feedback has been a manual process that lacks consistency and efficiency, and is often ineffective at deriving useful information that can help with faculty improvement. In this paper, we describe an AI-based Student Feedback Analysis System, which is capable of automating the process of feedback classification, summarization, and reporting based on user reviews gathered by coaching institutes. The model used within the system consists of a fine-tuned distilroberta-base transformer, which is trained on an aggregation of different educator review datasets to carry out three-class sentiment classification (positive, negative, and mixed). The class imbalance problem is addressed by balanced sampling and class-weighted loss function. After classification, each review is put through a two-step Llama 3.3 70B Large Language Model (Groq API) pipeline, where they are processed for theme extraction and report structuring. The outcome of the process is a professionally-looking PDF feedback report that consists of a sentiment distribution chart, key themes, and recommendations. Overall, the system achieves a macro-average F1-score of 0.87, improving the recall of the mixed class from 0.00 to 0.82.
Introduction
This paper presents an AI-powered student review analysis system that automatically transforms unstructured student feedback into meaningful, structured reports. Educational institutions often collect large volumes of student reviews through surveys and online forms, but manual analysis is slow, inconsistent, and inefficient. To address this challenge, the proposed system combines sentiment analysis, theme extraction, and automated report generation using advanced Natural Language Processing (NLP) and transformer-based models.
The system employs a fine-tuned DistilRoBERTa model to classify student reviews into three sentiment categories: positive, negative, and mixed. Unlike traditional binary sentiment analysis systems, this approach explicitly handles mixed reviews, which are common in educational feedback where students express both positive and negative opinions. To overcome class imbalance, balanced sampling and class-weighted training techniques were applied, significantly improving the model’s ability to detect mixed sentiments.
The methodology consists of three main stages. First, student reviews from multiple datasets are cleaned, balanced, and used to train the DistilRoBERTa model for three-class sentiment classification. Second, classified reviews are analyzed using the Llama 3.3 70B large language model to identify key themes and generate insights. Third, the generated feedback is automatically converted into a professional PDF report containing sentiment distributions, strengths, weaknesses, suggestions, and overall evaluations. The system also includes visualizations such as sentiment bar charts.
The trained model demonstrated strong performance, achieving a weighted F1-score of 87% on a test dataset of 931 reviews. Most notably, the recall for the mixed sentiment class improved from 0.00 in the baseline model to 0.82, showing successful recovery of a previously ignored category. Testing on real coaching institute reviews further confirmed the effectiveness of the system, with accurate sentiment classification and meaningful identification of themes such as teaching clarity, student motivation, pacing, and syllabus coverage.
Conclusion
In this work, we have developed a solution for automated feedback analysis that combines a sentiment classifier based on transformers with an LLM report generator. The fine-tuned distilroberta-base classifier showed an average F1 score of 0.87 for a three-class sentiment analysis task, with the ability to accurately recognize the mixed class after overcoming a zero recall score by balanced sampling and class weighting. This two-stage LLM-based pipeline generates structured reports with themes and gives faculty members as well as institutional administrators practical recommendations based on student reviews.
It has been demonstrated that with careful design of a machine learning pipeline, taking into consideration class imbalances, confidence thresholding, and structuring prompt questions, the output can be reliably generated in practice. A modular design of the system allows us to replace each component by an improved version when needed.
The future development will be centered on extending the system to become a fully-fledged aspect-based sentiment analysis system wherein the sentiments will be linked to teaching aspects like clarity, pacing, interactivity, and assessment, thus allowing for a more comprehensive and detailed evaluation of faculty members. Other developments include multi-topic batch processing, de-anonymization of personally identifiable expressions, and integration with institutional portals. Another crucial aspect for transitioning from the current prototype to the production environment is building a moderation layer, as discussed in the architecture.
References
[1] S. Gottipati, V. Shankararaman, and J. R. Lin, “Text analytics approach to extract course improvement suggestions from students’ feedback,” Research and Practice in Technology Enhanced Learning, vol. 13, no. 6, 2018, Art. no. 6.
[2] A. Koufakou, “Deep Learning for Opinion Mining and Topic Classification of Course Reviews,” arXiv preprint arXiv:2304.03394 [cs.CL], 2023.
[3] A. Bhowmik, N. M. Nur, M. S. U. Miah, and D. Karmekar, “Aspect-based Sentiment Analysis Model for Evaluating Teachers’ Performance from Students’ Feedback,” AIUB Journal of Science and Engineering, vol. 22, no. 3, pp. 132–139, Dec. 2023.
[4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of NAACL-HLT, 2019, pp. 4171–4186.
[5] A. Vaswani et al., “Attention Is All You Need,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017, pp. 5998–6008.
[6] T. Shaik, X. Tao, C. Dann, H. Xie, Y. Li, and L. Galligan, “Sentiment Analysis and Opinion Mining on Educational Data: A Survey,” arXiv preprint arXiv:2302.04359, 2023.
[7] Z. Kastrati, A. Kurti, and A. S. Imran, “WET: A word embedding-based transformer for student feedback analysis,” IEEE Access, vol. 8, pp. 141074–141087, 2020.
[8] M. Murtaza, Y. Ahmed, N. Ahmed, and S. Khalid, “Deep learning-based sentiment analysis of student feedback: A systematic review,” IEEE Access, 2022.
[9] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using siamese BERT-networks,” arXiv preprint arXiv:1908.10084, 2019.
[10] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, and M. Zettlemoyer, “BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” arXiv preprint arXiv:1910.13461, 2019.
[11] Q. Rajput, S. Haider, and S. Ghani, “Lexicon-based sentiment analysis of teachers\' evaluation,” Applied Computational Intelligence and Soft Computing, 2016.
[12] S. Rana and S. N. Singh, “Comparative analysis of sentiment analysis and opinion mining techniques,” in 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA), IEEE, 2020, pp. 1–5.
[13] Hugging Face, “Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX,” 2023. Available: https://github.com/huggingface/transformers.
[14] Groq Inc., “LPU Inference Engine: Scaling Large Language Models,” Technical Whitepaper, 2024.