Authors: Karan Khatavkar
Certificate: View Certificate
Speech is a fundamental and reliable mode of human communication, conveying not only linguistic content but also essential cues about the speaker, including language, emotion, gender, and identity. However, in contemporary telephone and audio-based communication systems, speech signals are highly susceptible to the deleterious effects of ambient and Gaussian noise. To detach this noise from the signal, many Digital Audio Filters can be employed, including the Audio Weighting Filter, low-pass and High-pass filter, Band Pass Filter, and Band Stop Filter. This paper implements and compares various digital filters for audio signal enhancement using MATLAB Simulink and compares the outcome to identify the best filter for this specific purpose.
With technological developments, voice communication continues to rise as a primary communication medium. The speech signal conveys various information to the listener as a type of spoken language and emotions, gender, and identity of the Speaker. Individual features such as pitch, fundamental frequency and formant frequency can be characteristic features of human speech which makes all of us sound different. The three main processes by which sound is produced are the twisting of nerves, the impact of wires on membranes, or the blowing of air through holes. However, the mechanism of human voice is different as it is expressed in different languages and emotions under the control of the brain. Nonetheless, the ubiquitous nature of noise continues to deteriorate the quality of voice communication. Speech background noise is an undesired signal which mixes with speech signal at the time of generation of speech signal or at the time of Transmission. It becomes crucial to reduce this noise from the signal and channel to enhance the signal quality. Various filtering techniques are used to reduce noise. Regardless of the form of signal, a filter can be thought of as a medium through which the signal travels. However, unless anything can significantly alter the signal, we rarely consider something to be a filter. A digital filter is simply a filter that works with digital signals, such the sound that a computer represents. In essence, the application of filtering techniques to reduce noise in both the signal and the communication channel represents a critical step in ensuring effective and reliable communication. The following classification can be used to define the various filter types:
II. METHODOLOGY AND IMPLEMENTATION
The methodology employed in this research begins with the collection and pre-processing of a dataset containing speech samples tainted by background noise. These speech data are meticulously prepared to ensure uniformity and compatibility with MATLAB Simulink, the chosen platform for audio signal processing. The ample availability of libraries tailored for signal processing like “MATLAB Audio Toolbox” in MATLAB eliminated any constraints in this regard. Additionally, MATLAB Simulink offered notably low latency which ensured that real-time signal processing. Humans can tolerate up to 200ms of end- to-end latency when conversing, otherwise we talk over each other on calls. The longer the latency, the more we notice it and the more annoyed we become. A crucial initial step involves characterizing the nature of the noise contaminating the speech signals, identifying its frequency components and specific characteristics. With the Simulink environment established, the pre-processed speech data and noise profiles are imported for further analysis and processing.
To address the noise reduction task effectively, implemented a range of audio filters within MATLAB Simulink. The implementation of these filters within the MATLAB Simulink environment enabled to systematically process and enhance speech signals by reducing unwanted noise components. These filters included weighted filters, low-pass filters, band-pass filters, and band-stop filters. Each filter type was selected and configured based on its suitability for specific noise profiles and speech enhancement goals. This approach allowed us to comprehensively explore and compare the performance of various filter configurations in mitigating background noise and improving speech intelligibility. The subsequent analysis of the results obtained from these filter implementations provided valuable insights into the potential applications of noise reduction methods in voice and audio-related systems, such as telephone communication, voice recognition, and voice communication.
The model of the MATLAB Simulink system, as illustrated in Fig. 1, served as the central framework for implementing and testing the audio filters. This model was designed to simulate real-world noise scenarios as well as to process MPEG Audio Layer (MP3) files and evaluate the effectiveness of the noise reduction techniques. The data collected through these simulations was then analysed to assess the performance of each filter in improving speech signal quality. Overall, this methodology provided a structured and systematic approach to tackle the critical issue of noise reduction in speech signals, laying the foundation for meaningful advancements in speech communication systems.
The model incorporated a series of critical blocks to effectively address the issue of noise reduction in speech signals. These blocks, their functions, and their interactions were pivotal to the research process:
A. Audio Device Reader
Reading audio samples from the computer's audio device is under the purview of the Audio Device Reader block. It enables the setting of crucial variables like the sampling rate (44100 Hz), the number of channels (1), and the samples per frame (1024). These variables specify how the audio data will be input.
B. Multimedia File Reader
The Multimedia File block makes it easier to read audio samples from multimedia files, enabling the model to incorporate outside audio data. The block can accommodate a wide range of data kinds and amplitude ranges, making it flexible.
C. Time Scope
To provide a visual representation of the audio data that has been processed, the Time Scope block, a component of DSP System Toolbox, is used to display the time domain signal.
D. Channel Noise
Many signals obtained from practical systems are contaminated by additive white gaussian noise. Various factors, such as thermal noise, interference, and transmission errors, contribute to the inherent noise in communication channels. This block introduces simulated noise into the signal, replicating real-world conditions. The configuration includes specifying the noise source type as Gaussian, with a variance of -0.005. The sample time is set to 1/44,100 Hz, and each frame contains 1,024 samples.
The Simulink model and the MATLAB environment can interchange data more easily thanks to this block's interface with the MATLAB workspace, enabling more signal data analysis and manipulation.
F. Spectrum Analyzer
The Spectrum Analyzer object displays frequency-domain signals and the frequency spectrum of time-domain signals. The scope shows the spectrum view and the spectrogram view. Two Spectrum Analyzer blocks are integrated into the model, one positioned before the filter and another after the filter. These blocks provide visual representations of the frequency spectra of signals, enabling a comprehensive assessment of signal characteristics and noise reduction effectiveness. The configuration is set to "Auto" to adapt to the signal's characteristics dynamically.
The central component of the system, this block implements various filters for noise reduction. Each filter is individually configured for specific characteristics. This block acts as the primary mechanism for enhancing the quality and intelligibility of speech signals in the presence of background noise. Section III gives a thorough rundown of the configured filters that have been used along with their outcomes. Here's an overview of the key features of the Filter Block:
H. Audio Device Write
The processed audio samples needed to be written to an audio output, which was the job of the Audio Device Writer block. It offered settings to specify the device bit depth (16-bit) and sample rate (44100 Hz), ensuring compatibility with audio output requirements.
I. Multimedia Output File Writer
This block writes and saves the processed audio output to specific file types such as .wav, .avi, .mp4, .wmv, etc. Configurations include file type (.wav), device bit depth (16-bit, integer), sample rate (44,100 Hz), and the number of samples per frame (1024).
III. FILTER IMPLEMENTATION AND RESULTS
The following section is dedicated to the comprehensive examination of Filter Implementation and the subsequent presentation of results. Within this section, each filter type incorporated into the system will be discussed, highlighting their respective implementation parameters and strategies. Subsequently, the results derived from the application of these filters in the context of noise reduction and signal enhancement will be rigorously analysed and presented. These filters include weighted filters, low-pass filter, high-pass filter, band-pass filter, and band-stop filter.
A. Audio Weighting Filter
A special filter used in measuring loudness levels, and consequently carried over into audio noise measurements of equipment. Weighting filters are a special type of band-limiting filters designed to correspond to the way we hear or some other specific criteria. The A-weighting curve has been widely adopted for environmental noise measurement and is standard in many sound level meters.  A-weighting is also frequently used to evaluate potential hearing impairment brought on by loud noise, but this use appears to be based more on the availability of sound level meters that incorporate A-Weighting than on any solid experimental data to support it. This filter incorporates the "A" frequency weighting, the fig. 2 represents the frequency response of filter, strategically chosen to optimize its noise reduction capabilities for the targeted application.
B. Low Pass Filter
The range of frequency under which the speech signal falls is the 300–3400 Hz. However, the audible frequency range for human beings are from 20Hz to 20 KHz. All signal components below a given cut-off frequency can pass through the low pass filter, but those beyond the cut-off frequency are attenuated. This filter type is well-suited for keeping important speech information while suppressing high-frequency noise components since speech signals primarily inhabit lower frequency ranges. It is of the Finite Impulse Response (FIR) type, with a passband edge frequency set at 8000 Hz and a maximum passband ripple of 0.1 dB. The filter's input sample rate is configured to match the incoming signal at 44,100 Hz.
Fig. 3 provides a visual representation of the frequency response of the Low Pass Filter, illustrating its selective attenuation of frequencies above a specified cut-off threshold. This frequency response characterization is instrumental in understanding how the filter effectively removes higher-frequency components from the input signal. Fig. 4 offers a comparative view by displaying both the input and output spectra of the Low Pass Filter. The input spectrum showcases the frequency components of the original signal, while the output spectrum demonstrates the alterations in signal content after passing through the filter. Notably, these figures collectively underscore the filter's effectiveness in reducing noise and enhancing signal quality by attenuating frequencies beyond the designated cut-off point.
The results affirm the rationale behind employing a Low Pass Filter, as speech signals predominantly reside within lower frequency ranges. By removing high-frequency noise components, this filter contributes significantly to the reduction of background noise, ultimately leading to improved speech intelligibility and signal clarity.
C. High Pass Filter
A high-pass filter (HPF) allows signals with frequencies higher than a specified cut-off frequency to pass through while reducing the amplitudes of signals below this threshold. The extent of attenuation at various frequencies is contingent upon the specific characteristics of the filter design. In the realm of audio engineering, a high-pass filter is occasionally referred to as a low-cut filter or bass-cut filter. However, for the current application, which involves voice signals typically spanning the frequency range of 300Hz to 3600Hz, the practicality of implementing a high-pass filter is not deemed feasible.
D. Band Pass Filter
The band-pass filter, a versatile tool in noise reduction, allows the selection of specific frequency ranges within the signal. In this context, it can be adjusted to target the typical frequency range of human speech, making it an valuable asset in improving intelligibility. Like the low-pass filter, it utilizes an FIR filter type and employs the following parameters: Stopband Frequency 1 at 100 Hz, Passband Frequency 1 at 300 Hz, Passband Frequency 2 at 3,400 Hz, Stopband Frequency 2 at 6,000 Hz, and a maximum passband ripple of 1 dB. The input sample rate from the input source remains at 44,100 Hz.
Fig. 6 graphically presents the frequency response of the Band-Pass Filter, providing a visual representation of its selective attenuation of frequencies outside the passband range. This frequency response characterization is crucial for understanding the filter's behaviour in isolating the desired frequency components. Fig. 7 offers a comparative analysis by showcasing the input and output responses of the Band-Pass Filter. The input spectrum demonstrates the frequency components of the original signal, while the output spectrum reveals the signal's transformation after passing through the filter. Notably, these figures collectively illustrate the efficacy of the Band-Pass Filter in reducing noise and enhancing the signal's quality within the specified passband.
The experimental results indicate a notable reduction in noise interference, underscoring the filter's effectiveness in improving speech signal intelligibility and overall quality. This successful noise reduction outcome aligns with the filter's designed purpose.
IV. RESULT ANALYSIS
Given the specific challenges posed by speech signal noise reduction, alternative filtering techniques, such as low-pass or band-pass filters, are typically preferred, as they enable more precise control over the frequency components that need to be retained or attenuated. These filters are better suited for optimizing speech signal quality and intelligibility in noisy environments, making them a more practical choice for this application.
This paper has delved into the critical domain of noise reduction in speech signals, addressing the fundamental challenges posed by background noise in communication systems. The study has systematically explored various filtering techniques, including weighted filters, low-pass filters, and band-pass filters, all strategically implemented to enhance the quality and intelligibility of speech signals amidst noisy environments. The outcomes of this research endeavour have revealed the significance of tailored filtering solutions in mitigating the adverse effects of noise interference. Through a rigorous evaluation of these filters, we have observed their varying degrees of effectiveness in reducing noise while preserving essential speech components. Notably, the low-pass and band-pass filters have demonstrated their utility in improving speech signal clarity by selectively attenuating noise frequencies beyond their respective cut-off points. Furthermore, this study has shed light on the limitations of certain filtering techniques, such as the Band-Stop Filter, emphasizing the importance of selecting the most appropriate filter for a given noise reduction scenario. In the pursuit of optimizing speech communication systems, the findings of this research hold practical implications for a wide range of applications, including telephone conversations, voice recognition, and voice communication in noisy environments. By tailoring filtering strategies to specific noise profiles and intelligently adapting them to speech signal characteristics, we can significantly enhance human communication, ensuring that messages are conveyed with utmost clarity and accuracy. As we look ahead, further research and refinement of noise reduction techniques remain imperative in our ever-evolving world of communication technologies. By continually advancing our understanding and implementation of these filters, we can foster more effective and seamless communication experiences, ultimately bridging the gap between speakers and listeners, even in the presence of challenging noise environments.
 Haque, Minajul & Bhattacharyya, Kaustubh. (2018). Speech Background Noise Removal Using Different Linear Filtering Techniques. 10.1007/978-981-10-8240-5_33.  Singh , M. and Garg, N.K. (2015) Audio noise reduction using butter worth filter, International Journal of Computer & Organization Trends. Available at: https://www.academia.edu/14558314/Audio_Noise_Reduction_Using_Butter_Worth_Filter.  Tolliday, J. (2020) What are A, C & Z frequency weightings?, NoiseNews. Available at: https://www.cirrusresearch.co.uk/blog/2020/03/what-are-a-c-z-frequency-weightings/.  Fox, A. (2021) Audio Eq: What is A band-stop filter & how do bsfs work?, My New Microphone. Available at: https://mynewmicrophone.com/audio-eq-what-is-a-band-stop-filter-how-do-bsfs-work/.  Baghdasaryan, D. (2018) ‘Real-Time Noise Suppression Using Deep Learning’, Nvidia Developer, 31 October. Available at: https://developer.nvidia.com/blog/nvidia-real-time-noise-suppression-deep-learning/.  Weichao Kuang, Bingo Wing-Kuen Ling, & Zh?ing Yang (2017). Parameter free and reliable signal denoising based on constants obtained from IMFs of white Gaussian noise. Measurement, 102, 230-243.  spectrumAnalyzer, Display frequency spectrum of time-domain signals - MATLAB - MathWorks India. Available at: https://in.mathworks.com/help/dsp/ref/spectrumanalyzer.html.  Sweetwater (2001) Weighting filter, Sweetwater. Available at: https://www.sweetwater.com/insync/weighting-filter/.  Weighting filter Wikipedia. Available at: https://en.wikipedia.org/wiki/Weighting_filter  Kalandharan N (2014) Speech enhancement by spectral subtraction method. Int J Comput Appl 96(13):0975–8887
Copyright © 2023 Karan Khatavkar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.