This paper introduces ExamGenie, a next-generation automated exam paper generation system designed to streamline the creation of examination papers for educators. ExamGenie integrates advanced technologies for text extraction, including PyMuPDF, pdfplumber, python-pptx, Tesseract, and Google Cloud Vision (OCR), allowing the system to handle various document formats. Text search is powered by NLTK, regex, and spaCy, while large language models (LLMs) such as OpenAI GPT and Hugging Face Transformers are used for question generation and categorization based on difficulty. Built on robust web frameworks like Flask and Django, and supported by vector databases, ExamGenie offers a flexible and efficient tool for crafting customized, balanced exam papers. This system reduces manual effort, improves accuracy, and ensures alignment with academic standards, providing a significant improvement in the educational examination process.
Introduction
Creating exam papers manually is often time-consuming and prone to errors for educators. To address this, ExamGenie is proposed as an automated exam paper generation system that uses OCR, natural language processing (NLP), and large language models (LLMs) to generate customized, balanced, and plagiarism-free exam papers from various input formats (manual entry, PDFs, Word, PPT).
Unlike existing systems that rely mostly on static question banks and offer limited customization, ExamGenie supports flexible input, advanced AI-driven question generation, and ensures academic integrity by avoiding repetition and plagiarism. The system is web-based, built using Flask and Django, and uses a vector database for efficient data storage and retrieval.
ExamGenie’s process includes text extraction, question classification by difficulty and type, AI generation of new relevant questions, and customizable paper assembly. Educators can review, adjust, and provide feedback on generated papers to continuously improve the model.
Pilot results show ExamGenie reduces exam paper creation time by 70%, achieves 95% OCR accuracy, and produces AI-generated questions that meet educator standards with high satisfaction rates. User feedback will guide further customization and system improvements.
Conclusion
ExamGenie represents a significant advancement in the automation of exam paper generation, addressing the longstanding challenges faced by educators in creating high-quality assessments. By leveraging advanced technologies such as Optical Character Recognition (OCR), natural language processing (NLP), and large language models (LLMs), ExamGenie streamlines the process, reducing the time required for exam preparation by an average of 70%. The system not only enhances efficiency but also ensures that the generated questions are relevant, diverse, and aligned with academic standards, achieving an impressive accuracy rate of 95% in text extraction. The positive feedback from educators demonstrates that ExamGenie successfully meets the needs of modern educational environments, significantly improving user satisfaction and enabling teachers to focus more on instruction rather than administrative tasks. While the results indicate a successful pilot implementation, there remains potential for further enhancements based on user feedback, such as additional customization options. By continuing to refine and adapt ExamGenie, we can further revolutionize the exam preparation process, ultimately enhancing the educational experience for both teachers and students.
References
[1] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.
[2] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171-4186.
[3] Ray, S. (2019). A quick review of machine learning algorithms. Proceedings of the International Conference on Machine Learning and Soft Computing, 133, 22-31
[4] Smith, R., Antonova, D., & Lee, S. (2021). Leveraging OCR for improved educational content digitization. Journal of Educational Technology and Development, 42(3), 123-136
[5] Kaur, G., & Gill, N. S. (2018). Automatic question generation using natural language processing techniques: A survey. International Journal of Computer Applications, 181(1), 15-20.
[6] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114.
[7] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[8] Van Rossum, G., & Drake, F. L. (2009). Python 3 reference manual. Scotts Valley, CA: CreateSpace.
[9] Zhao, S., & Zhu, T. (2017). A comprehensive review of OCR techniques. Journal of Computer Science and Technology, 33(2), 405-422.
[10] Williams, K., & Jackson, P. (2020). The role of artificial intelligence in modern education. Educational Technology Research and Development, 68(5), 1231-1250.