The rise in online attacks has made many traditional ways of confirming someone’s identity, like passwords and pins, easier for criminals to steal or guess as well as allowing them access to people’s stored passwords, making these forms of identity verifications easy to compromise. The use of biological or behavioural traits associated with a person (biometric traits) can replace traditional passwords and pins as secure forms of identity verification. This research paper describes a method of accessing secure files through verifying an individual’s voice and subsequently encrypting that access using a modern encryption algorithm (AES-GCM). The method of accessing secure files through this process is achieved by extracting Mel Frequency Cepstral Coefficients (MFCCs) from an individual’s voice when the individual uses their voice to verify their identity. An experimental analysis was completed whereby voice samples were obtained from 20 users to evaluate the proposed secure voice-based authentication system. The experimental results indicate that the proposed secure voice-based authentication system achieved an authentication success rate of approximately 92%. The proposed system also achieved an average False Acceptance (FAR) of 4.1% and a False Rejection (FRR) rate of 3.9% which illustrates that the proposed method of using an individual’s voice to access secure files will provide a reliable solution while having very little impact on the computational overhead required to verify the identity of a user. The results suggest that voice-based biometric authentication systems represent a viable method for improving data security and for this area to be researched further.
Introduction
The text presents a Voice-Based Authentication System for File Sharing (VBA-System) that uses voice biometrics instead of traditional passwords or PINs to provide secure access to files. As cyber threats increase and password-based authentication becomes more vulnerable, voice recognition is considered a more secure and user-friendly alternative because every person's voice is unique and difficult to replicate.
The proposed system combines biometric authentication with encryption technologies. It uses Mel Frequency Cepstral Coefficients (MFCCs) for extracting unique voice features, machine learning tools such as TensorFlow, Keras, and Librosa for voice analysis, and AES-GCM encryption to protect files. The system also includes anti-spoofing mechanisms to defend against replay attacks and AI-generated synthetic voices.
Main Objectives
The research aims to:
Develop a secure voiceprint-based authentication system for file access.
Integrate MFCC feature extraction with AES-GCM encryption for enhanced security.
Implement a fully client-side architecture to protect biometric data privacy.
Evaluate system performance under different environmental conditions.
Related Work
Previous studies have shown that voice biometrics can effectively replace passwords and physical tokens. Researchers have used:
MFCC and spectrogram analysis for speaker recognition.
Deep learning models such as CNNs and RNNs to improve recognition accuracy.
Anti-spoofing techniques, liveness detection, and signal analysis to prevent fraudulent access.
Cryptographic methods like AES encryption combined with biometric authentication for stronger security.
However, many existing systems rely on server-side processing, raising concerns about privacy and storage of biometric data.
Novel Contribution
The key innovation of the proposed system is its fully client-side implementation. Unlike traditional systems that store biometric data on remote servers, all voice processing, authentication, and encryption occur directly in the user's browser. This:
Improves privacy and security.
Eliminates dependence on third-party servers.
Enables real-time authentication.
Requires no specialized hardware beyond a microphone-equipped device.
System Architecture
The system consists of five layers:
UI Layer – Handles user registration, login, file encryption, and decryption through React-based interfaces.
Application State Layer – Manages user data and coordination among components.
Core Logic Layer – Includes the voice biometric engine and cryptography engine.
Browser API Layer – Performs local voice processing and encryption without backend services.
External Services Layer – Uses the Google Gemini API to provide AI-generated explanations and assistance.
Authentication Process
The authentication workflow includes:
Capturing the user's voice through a microphone.
Preprocessing audio to remove noise.
Extracting MFCC features from the voice signal.
Creating a unique voiceprint.
Comparing the voiceprint with the stored template using cosine similarity.
Granting or denying access based on a similarity threshold.
Providing secure file access through AES-GCM encryption with keys generated using PBKDF2.
Implementation
The system is implemented using:
React
TypeScript
Tailwind CSS
Browser-based audio APIs
Client-side cryptographic processing
All biometric and encryption operations are performed locally in the user's browser, ensuring that sensitive voice data never leaves the device.
Conclusion
The Voiceprint Authentication for File System exemplifies a new, user-friendly, safe, and modern method of securing files through the combination of voice biometrics and AES-GCM encryption. There is no third-party server processing within the system because all processes occur in the client\'s browser. According to experimental data, using voiceprints rather than passwords provides an approximately 92% accurate authentication, suggesting that voiceprints can serve as viable substitutes for traditional passwords while providing a more natural and intuitive experience. There are still challenges associated with this method, such as noise, variability of
the user\'s voice, and the risk of being spoofed; however, it lays the groundwork for future improvements. Ultimately, this system offers a strong, expandable, and future-proof solution for protecting sensitive electronic data in numerous real-world settings.
References
[1] D. O’Shaughnessy, “Review of methods for automatic speaker verification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1776–1789, 2023.
[2] R. Sharma, D. Govind, J. Mishra, A. Dubey, K. Deepak, and S. Prasanna, “Milestones in speaker recognition,” Artificial Intelligence Review, vol. 57, no. 3, p. 58, 2024.
[3] M. Jakubec, R. Jarina, E. Lieskovska, and P. Kasak, “Deep speaker embeddings for speaker verification: Review and experimental comparison,” Engineering Applications of Artificial Intelligence, vol. 127, p. 107232, 2024.
[4] R. M. Hanifa, K. Isa, and S. Mohamad, “A review on speaker recognition: Technology and challenges,” Computers & Electrical Engineering, vol. 90, p. 107005, 2021.
[5] T. Kinnunen and H. Li, “An overview of text-independent speaker recognition: From features to supervectors,” Speech Communication, vol. 52, no. 1, pp. 12–40, 2010.
[6] H. Delgado et al., “ASV spoof 2021: Automatic speaker verification spoofing and countermeasures challenge evaluation plan,” arXiv:2109.00535, 2021.
[7] S. Wang, Z. Wu, and H. Meng, “Deep learning approaches for speaker verification: A survey,” IEEE Access, 2023.