This project developsan AI-based system for contextual analysis and audio transcription to help with the assessment of applicant interviews. The candidate\'s profile contains all interview materials, including audio, transcriptions, and AI-generated insights like summaries, keywords, and purpose. Candidates might be recognized by their distinct mobile number or email address. The administrator can create interviewer accounts, construct candidate profiles, and upload interview audio, while interviewers can safely study previous interview histories to make educated assessments. The solution addresses problems with multi-round interviews by centralizing data, improving organization, consistency, and decision- making, and providing practical experience in creating a secure, role- based web application combined with AI technology.
Introduction
The proposed AI-Based Audio Transcription and Context Analysis system automates interview evaluation by combining speech recognition, natural language processing (NLP), and machine learning. Unlike traditional manual assessments, which are time-consuming and subjective, the system converts interview audio into text, analyzes contextual meaning, identifies sentiment, extracts keywords, evaluates communication clarity, and generates structured performance reports. It also provides secure, role-based access and centralized storage for multi-round interview records, ensuring transparent and consistent candidate evaluation.
The system follows a modular workflow that includes audio upload, preprocessing, speech-to-text transcription, contextual analysis, insight generation, and secure data management. Using AI-based Automatic Speech Recognition (ASR), interview audio is accurately transcribed, while NLP techniques summarize responses, extract technical keywords, detect confidence and communication quality, and assess answer relevance. Multi-round interview tracking links all interview sessions to a single candidate profile, allowing easy comparison of performance across different stages.
Experimental evaluation demonstrates high transcription accuracy, effective sentiment and contextual analysis, and reliable extraction of important skills and topics from interview responses. The system processes interview data efficiently with minimal delay, provides an intuitive user interface, and supports keyword-based search and retrieval of candidate records. Compared with traditional interview evaluation methods, the proposed framework improves fairness, transparency, consistency, and hiring efficiency while reducing manual effort. Its modular, scalable, and secure architecture makes it suitable for small and medium-sized organizations seeking intelligent, AI-driven recruitment solutions.
Conclusion
Inthisproject, wedevelopedasystemtoarrangeandevaluateapplicant interview audio recordings from the perspectives of storage, accessibility, and audio quality. We demonstrated that a methodical approachtodatabasemanagementandaudio processingensuresreliable storage, efficient retrieval, and correct association of candidate profiles with their corresponding audio files. The technology was created to improvetheevaluationworkflowbygivinginterviewersaconsistentand easy-to-useinterfacetoanalyze recordings. Inparticular,weproposeda modular solution where the administrator uploads the files, which are then processed, validated, and securely stored, and the interviewer can accessandanalyzeaudiorecordings inasystematicmanner.preserving dataintegrity,accordingtofunctionaltestsanddemonstrations.
.
References
[1] YiboHe, Kah Phooi Seng, Li-MinnAng,\"CollaborativeAI Dysarthric SpeechRecognitionSystem WithDataAugmentation UsingGenerative AdversarialNeuralNetwork,Publishedin:IEEETransactionsonNeural Systems and Rehabilitation Engineering( Volume: 33),Date of Publication: 15 May 2025,DOI: 10.1109/TNSRE.2025.3570383
[2] C. Xu and Z. Cao, \"Robust speech recognition using a harmonicmodel,\"inTsinghuaScienceandTechnology,vol.9,no.2,pp.202-206,April2004.
[3] [3]S. A. Naeini, L. Simmatis, D. Jafari, Y. Yunusova and B. Taati, \"Improving Dysarthric Speech Segmentation With Emulated and SyntheticAugmentation,\"inIEEEJournalofTranslationalEngineering in Health and Medicine, vol. 12, pp. 382-389, 2024, doi:10.1109/JTEHM.2024.3375323.
[4] H. L. Nattrass, \"Digital signal processing in single sideband radio measurements,\" in Transactions of the South African Institute of Electrical Engineers, vol. 71, no. 7, pp. 179-182, July 1980.
[5] G.Richardetal.,\"AudioSignalProcessingin the21stCentury:The important outcomes of the past 25 years,\" in IEEE Signal Processing Magazine, vol. 40, no. 5, pp. 12-26, July 2023, doi: 10.1109/MSP.2023.3276171.
[6] S.S.Y.Tun,S.Okada,H.-H.HuangandC.W.Leong,\"Multimodal Transfer Learning for Oral Presentation Assessment,\" in IEEEAccess, vol. 11, pp. 84013-84026, 2023, doi:10.1109/ACCESS.2023.3295832.
[7] S. Yoon, H. Kim, K. Kim and S. Lee, \"Comparative Analysis of AutomaticSpeechRecognitionFine-TuningStrategiesforSpeechFrom CochlearImplantUsers,\"inIEEESignalProcessingLetters,vol.33,pp. 236-240, 2026, doi: 10.1109/LSP.2025.3640524.
[8] C. Kim, J. Choi, J. Yoon, D. Yoo and W. Lee, \"Fairness-Aware Multimodal Learning in Automatic Video Interview Assessment,\" in IEEE Access, vol. 11, pp. 122677-122693, 2023, doi:10.1109/ACCESS.2023.3325891.
[9] W.WetsiriandW.Paireekreng,\"AutomatingCommunityPharmacy Workflows: The Impact of RPA on Operational Efficiency and Patient Care,\" in Journal of Mobile Multimedia, vol. 21, no. 1, pp. 113-147, January 2025, doi: 10.13052/jmm15504646.2115.
[10] M. Raptaki, G. Stergiopoulos and D. Gritzalis, \"Automated Event LogAnalysis With Causal Dependency Graphs for Impact Assessment of Business Processes,\" in IEEEAccess,vol. 12,pp.194322- 194339,2024,doi:10.1109/ACCESS.2024.3520420.
[11] F. P. . -W. Lo et al., \"Dietary Assessment With Multimodal ChatGPT:A Systematic Analysis,\" in IEEE Journal of Biomedical and Health Informatics, vol. 28, no.12,pp.7577-7587,Dec.2024,doi:10.1109/JBHI.2024.3417280.
[12] Y. Li, S. Kumbale, Y. Chen, T. Surana, E. S. Chng and C. Guan, \"Automated Depression DetectionFrom TextandAudio:A Systematic Review,\"inIEEEJournalofBiomedicalandHealthInformatics,vol.29, no. 10, pp. 7498-7513, Oct. 2025, doi:10.1109/JBHI.2025.3570900.
[13] Hyon Kim, Emmanouil Benetos, Xavier Serra,\"Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance\",Published in: IEEE Signal Processing Letters ( Volume: 32),Date of Publication: 17 November 2025,DOI: 10.1109/LSP.2025.3633579
[14] Adrián Pabón Mendoza, Kenneth J. Barrios Quiroga, Samuel D. Solano Celis, Christian M. Quintero,\"NAIA: A Multi-Technology Virtual Assistant for Boosting Academic Environments—A Case Study\",Published in: IEEE Access ( Volume: 13),Date of Publication: 11 August 2025,DOI: 10.1109/ACCESS.2025.3597565
[15] Qiang Jian,\"Multimedia Teaching Quality Evaluation System in Colleges Based on Genetic Algorithm and Social Computing Approach\",Publishedin:IEEEAccess(Volume:7),DateofPublication: 04 December 2019 ,DOI: 10.1109/ACCESS.2019.2957447
[16] Yang Huang, Tao Yu, Younghwan Pan,\"Which Visual Features InfluencePerceivedAuthenticityinAI-GeneratedPortraitPhotography? A Mixed-Methods Study\",Published in: IEEE Access (Volume: 13),Date of Publication: 30 October 2025,DOI: 10.1109/ACCESS.2025.3626978
[17] Michael Gian Gonzales, Peter Corcoran; Naomi Harte, Michael Schukat,\"Joint Speech-Text Embeddings for Multitask Speech Processing\",Published in: IEEE Access ( Volume: 12),Date of Publication: 03 October 2024,DOI: 10.1109/ACCESS.2024.3473743
[18] TohruShimizu, Yutaka Ashikari, EiichiroSumita, Jinsong Zhang, Satoshi Nakamura,\"NICT/ATR Chinese-Japanese-English speech-to- speech translation system\",Published in: Tsinghua Science and Technology ( Volume: 13, Issue: 4, August 2008),Date of Publication:August2008,DOI:10.1016/S1007-0214(08)70086-5
[19] Mujahid Jamal A. Khalifah, Michal Ptaszynski, Fumito Masu,\"Emotional Text-To-Speech in Japanese Using Artificially Augmented Dataset\",Published in: IEEE Access ( Volume: 12),Published in: IEEE Access ( Volume: 12),DOI: 10.1109/ACCESS.2024.3495694
[20] Jen-Tzung Chien, Pin-Yen Liu,\"Aligning Speech-Text Representations via Contrastive Modality Translation\",Published in: IEEE Transactions on Audio, Speech and Language Processing ( Volume: 33),Date of Publication: 28 August 2025,DOI: 10.1109/TASLPRO.2025.3603910