Speech Recognition in E-Mail

Authors: Ramya H, Saravanan. R

DOI Link: https://doi.org/10.22214/ijraset.2022.42859

Abstract

Speech recognition is one of the developing parts of AI and deep learning. Speech Recognition can be considered a predecessor of Voice recognition. SRE is based on speech recognition and helps send email by speaking and not by typing. Speech recognition involves PLP(Perceptual linear prediction), MFCC(Mel-frequency cepstral coefficients), and many other algorithms, which helps the software to distinguish the specific voice from the surrounding environment, make frequencies audible enough for human ears. DNN uses sound waves as data, to train the software. DNN uses Acoustic model, Pronunciation model, Language model, for speech-to-text, and text-to-speech.

Introduction

I. INTRODUCTION

E-mail is an information and communication technology. It uses technology to communicate a digital message from a sender to receiver over the internet. There are many software platforms and applications available to send and receive emails, but not making the process any easier.

Speech recognition is an AI-powered technology that automates communication between sender and the software, not with the receiver. In layman’s terms, it is a computer program that enables and makes computer interact with humans . Speech Recognition and DNN combine together to form automatic speech recognition and other developments in speech and voice recognition.

II. METHODOLOGY

Audio signals when emitted, the system recognizes it as input and the process starts:

A. Getting data

Python Speech Recognition module is used to listen to the speech and identify words.

PyAudio of python is used to acquire audio as data and for storing it in python audio library, and to play the audio from python.

Google speech API is also used to detect speech and listen to the words.

B. Simplifying data

Breaking down of analog waves into digital waves, which is done by Speech Recognition. It is the PyAudio, that enables the microphone for detecting speech. It converts the speech data into text.

C. Text to speech recognition

The digital data is then converted to text by ------ for voice(of digital data) and then prints through python command. Speak text is a method, used for creating audio for the text from the data or from programmed text to audio data.

Methods that play the main role includes:

r.listen() - method that activates microphone for listening to the speech data. It captures microphone input.

r.recognize() - method used to listen to the speech, from other ambient noise.

r,recognize_google() - method that calls google recognizer to listen to the speaking language and converts it into text.

E mail is sent from user and received by receiver by SMTP protocol, while IMAP is used for client and receiver to access the mailing. SMTP is protocol used by message transfer agents. SMTP has 3 main activities for delivering e-mail to the receiver,which are as follows,

Sending the email from sender to the email server

The email server sends mail to the receiver.

The sender downloads the email from the server.

The SMTP server needs permission from the receiver for sending email and when the message is not delivered, it is returned back to the sender.

III. MODELING AND ANALYSIS

IV. RESULTS ANS DISCUSSION

The speech recognition is activated on a button click action, by which the content for the mail is created(by speech recognition, PyAudio, Python speech-to-text).

When the receiver tries to send email through SMTP server(given permission to send emails), the server logs into the sender’s mail account and sends the mail to the email server. The email forwards the mail to the receiver’s mail address,or traces back to the sender(in case, the receiver’s mail id is not found).

V. ACKNOWLEDGMENT

I'd want to express my gratitude to Prof Mr.R.Saravanan, my supervisor, for his patient instruction, support,

and counsel throughout my time as a student. I was really fortunate to have a supervisor who was genuinely

concerned about my work and who replied to my inquiries and concerns promptly.

Conclusion

Speech recognition has always been a developing field of AI, and with limited process to use and prosper. Then, combining it with e-mailing, a process so common for it to be considered mundane, is a good way of guiding the AI with today\'s human\'s manual process. In DNN, speech recognition is introduced, but can never end in it\'s development phase.

References

[1] https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53 [2] https://hub.packtpub.com/how-deep-neural-networks-can-improve-speech-recognition-and-generation/

Copyright

Copyright © 2022 Ramya H, Saravanan. R. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET42859

Publish Date : 2022-05-17

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here