Multi-Modal Proctoring via MediaPipe FaceMesh, YOLOv8n Object Detection, and WebRTC VAD with Confidence-Weighted Human-in-the-Loop Labelling for Online Exam Integrity
Authors: Abhinay Vinsy Bale, K Sudeepa Kumari, Jagath Kalyani Dommeti, Devi Anjana Pushpa Dirsipomu, Abdul Raheem Shaik
This rapid shift to remote education has magnified the demand for reliable online exam proctoring; however, existing automated systems suffers from higher false-positive rates (25-40%), leading to unfair student flagging and institutional distrust. This paper presents an open-source, multi-modal AI proctoring system that integrates MediaPipe FaceMesh for face detection, LSTM sequential and streak filtering of face detection and gaze tracking via Eye Aspect Ratio (EAR), YOLOv8n for real-time prohibited-object detection, and WebRTC Voice Activity Detection (VAD) for distinguish human speech from background noise-all orchestrated through a Flask backend MJPEG streaming. The system introduces a human-in-the-loop verification pipeline where administrators review and label detected violations through a dedicated dashboard, with confidence-weighted feedback reducing false positives by up to 60%. It provides real-time monitoring while addressing ethical concerns regarding privacy, accessibility, and algorithmic fairness. While the findings show that AI proctoring systems can significantly improve exam security and reduce cheating incidents. However, careful attention must be paid to inclusivity and data governance to ensure the system is ethical, unbiased, and accessible to all users. This paper presents a systematic approach to implementing AI-based proctoring systems, serving as a foundation for future innovations in securing digital education.
Introduction
This text presents an AI-powered multi-modal online exam proctoring system designed to improve academic integrity while reducing the high false-positive rates common in existing automated proctoring solutions.
The rapid growth of online education and examinations, accelerated by the COVID-19 pandemic, has increased the demand for scalable online proctoring systems. Although automated proctoring is cost-effective and scalable, it often generates false-positive cheating alerts (25–40%), unfairly flagging students for normal behaviors such as looking away, adjusting glasses, background noises, or the presence of harmless objects. These inaccuracies can lead to stress, academic disputes, and institutional liability.
Problem Statement
Current proctoring systems face a trade-off:
Human proctoring is highly accurate but expensive and difficult to scale.
Automated AI proctoring is scalable but produces many false positives.
The study aims to solve this issue by creating an open-source, transparent, and multi-modal proctoring system that combines automated detection with human-in-the-loop verification to improve fairness and accuracy.
Key Concepts and Technologies
The proposed system integrates multiple detection methods:
Face Detection and Recognition
Uses facial recognition and face tracking to verify student identity.
Handles authentication and continuous monitoring.
Gaze Tracking
Uses the Eye Aspect Ratio (EAR) and facial landmarks to monitor eye movements and attention.
Detects suspicious gaze behavior while accounting for normal blinking patterns.
Object Detection
Uses YOLOv8 to identify prohibited items such as mobile phones and electronic devices during exams.
Improves small-object detection and real-time performance.
Audio Analysis
Uses WebRTC Voice Activity Detection (VAD) and spectral analysis to detect speech and suspicious audio events.
Analyzes voice activity without relying solely on simple volume thresholds.
Multi-Modal Sensor Fusion
Combines evidence from face detection, gaze tracking, object detection, and audio monitoring.
Since each modality has different error patterns, combining them reduces overall detection errors and increases reliability.
Human-in-the-Loop Verification
Human reviewers examine flagged incidents before penalties are applied.
This significantly reduces false accusations and improves fairness.
Fairness, Accountability, and Transparency (FAT)
The system emphasizes explainable decisions, auditability, and reduced demographic bias.
Unlike commercial "black-box" systems, administrators can review and verify all flagged violations.
Literature Gaps Identified
The study highlights several shortcomings in existing proctoring research and products:
No open-source multi-modal proctoring system with integrated human verification.
Lack of real-time administrative monitoring dashboards.
Limited use of contextual object detection.
Insufficient measurement and mitigation of demographic bias.
Persistent concerns regarding academic justice due to false-positive accusations.
Proposed System Architecture
The system follows these steps:
Student Authentication
Login using credentials.
Face Registration
One-time face capture and encoding for identity verification.
Exam Initialization
Students begin the exam after activating video proctoring.
Proctoring Setup
Initializes face detection, gaze tracking, object detection, and audio monitoring modules.
Real-Time Monitoring
Continuously analyzes webcam and audio streams during the exam.
Records suspicious events and logs violations for review.
Advantages
Reduced false-positive rates through human verification.
Increased fairness and transparency.
Real-time monitoring capabilities.
Open-source and auditable design.
Scalable for large numbers of online examinees.
Better protection of academic integrity compared to traditional automated systems.
Conclusion
This paper presented an open-source, AI-powered multi-modal proctoring system designed to ensure academic integrity in online examinations through real-time cheating detection. The system integrates four complementary detection modalities—face detection via RetinaFace with temporal LSTM streak filtering, gaze tracking via MediaPipe FaceMesh and Eye Aspect Ratio, object detection via YOLOv8n, and audio surveillance via PyAudio spectral analysis—to create a comprehensive cheating detection pipeline. All modalities are orchestrated through a modular Flask backend that streams processed video to the student browser via MJPEG over HTTP, while simultaneously logging violations with screenshot evidence for administrative review.
In conclusion, this system bridges the gap between expensive, opaque commercial proctoring solutions and naive open-source alternatives by combining state-of-the-art detection algorithms with human-verified fairness. By making the entire stack—including algorithms, metrics, and evaluation protocols—openly available, we hope to empower educational institutions, researchers, and policymakers to deploy transparent, equitable, and effective online examination security at scale.
References
[1] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, \"RetinaFace: Single-stage Dense Face Localisation in the Wild,\" Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), 2019, pp. 5203–5212.
[2] N. Lugaresi et al., \"MediaPipe: A Framework for Building Perception Pipelines,\" arXiv preprint arXiv:1906.08172, 2019.
[3] T. Soukupová and J. ?ech, \"Real-Time Eye Blink Detection Using Facial Landmarks,\" Proc. 21st Computer Vision Winter Workshop (CVWW), 2016.
[4] G. Jocher, A. Chaurasia, and J. Qiu, \"Ultralytics YOLOv8,\" 2023. [Online]. Available: https://github.com/ultralytics/ultralytics
[5] J. Redmon and A. Farhadi, \"YOLOv3: An Incremental Improvement,\" arXiv preprint arXiv:1804.02767, 2018.
[6] T. Y. Lin et al., \"Microsoft COCO: Common Objects in Context,\" Proc. European Conf. Computer Vision (ECCV), 2014, pp. 740–755.
[7] J. Sohn, N. S. Kim, and W. Sung, \"A Statistical Model-Based Voice Activity Detection,\" IEEE Signal Processing Letters, vol. 6, no. 1, pp. 1–3, 1999.
[8] S. Hochreiter and J. Schmidhuber, \"Long Short-Term Memory,\" Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[9] F. Schroff, D. Kalenichenko, and J. Philbin, \"FaceNet: A Unified Embedding for Face Recognition and Clustering,\" Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 815–823.
[10] F. Pedregosa et al., \"Scikit-learn: Machine Learning in Python,\" Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[11] S. Ren, K. He, R. Girshick, and J. Sun, \"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,\" IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
[12] W. Liu et al., \"SSD: Single Shot MultiBox Detector,\" Proc. European Conf. Computer Vision (ECCV), 2016, pp. 21–37.
[13] S. Barocas, M. Hardt, and A. Narayanan, Fairness and Machine Learning: Limitations and Opportunities. Cambridge, MA: MIT Press, 2023.
[14] J. Buolamwini and T. Gebru, \"Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,\" Proc. Conf. Fairness, Accountability, and Transparency (FAccT), 2018, pp. 77–91.
[15] D. Amodei et al., \"Concrete Problems in AI Safety,\" arXiv preprint arXiv:1606.06565, 2016.
[16] M. H. Chuang, \"Online Exam Proctoring Technologies: Educational Innovation or Invasion of Privacy?,\" Journal of Higher Education Policy and Management, vol. 43, no. 4, pp. 415–428, 2021.
[17] M. C. King, S. D. Sottile, and M. C. C. Smith, \"Privacy and Equity Implications of Remote Proctoring in Higher Education,\" Computers and Education Open, vol. 3, 2022, Art. no. 100079.
[18] C. R. Harris et al., \"Array Programming with NumPy,\" Nature, vol. 585, no. 7825, pp. 357–362, 2020.
[19] W. McKinney, \"Data Structures for Statistical Computing in Python,\" Proc. 9th Python in Science Conf. (SciPy), 2010, pp. 56–61.
[20] European Parliament, \"Regulation (EU) 2016/679 of the European Parliament and of the
[21] A. Khaleghi, A. Khamis, F. O. Karray, and S. N. Razavi, “Multisensor Data Fusion: A Review of the State-of-the-Art,” Information Fusion, vol. 14, no. 1, pp. 28–44, 2013.
[22] Market Research Future, \"Online Exam Proctoring Market Research Report: Global Forecast till 2032,\" 2025. [Online]. Available: https://www.marketresearchfuture.com/reports/online-exam-proctoring-market-10555
[23] Astute Analytica, \"Online Exam Proctoring Market - Global Industry Analysis, Size, Share, Growth, Trends, and Forecast 2023–2035,\" 2023. [Online]. Available: https://www.astuteanalytica.com/industry-report/online-exam-proctoring-market