In today’s fast-paced corporate and academic environments, efficient documentation of meetings is critical for effective communication, decision tracking, and accountability. However, manual transcription and summarization of meeting discussions are often time-consuming and error-prone. This paper proposes an automated system for generating meeting minutes using Natural Language Processing (NLP) techniques. to enhance efficiency, accuracy, and accessibility. The system processes audio or text transcripts of meetings and employs speech recognition, text summarization, and key information extraction models to generate concise, coherent, and structured minutes. We explore a combination of extractive and abstractive summarization methods, including transformer-based models like BERT and GPT, to capture salient discussion points, decisions made, and action items. Additionally, named entity recognition (NER) and topic segmentation are used to enhance content relevance and organization. Experimental
Introduction
Overview
In today’s remote work culture, online meetings (via Zoom, Teams, etc.) are crucial. However, manual minute-taking (MoM) is time-consuming, error-prone, and inefficient. Automating this task using Natural Language Processing (NLP) and Artificial Intelligence (AI) can greatly improve meeting productivity, documentation, and decision tracking.
Problem Context
Key challenges with traditional meeting minutes:
Multiple speakers and lengthy discussions lead to confusion and information loss.
Participants may miss meetings or forget critical decisions.
Manual note-taking is labor-intensive and unreliable.
Proposed Solution
Develop an Automatic Meeting Minutes Generation System that:
Transcribes speech to text using ASR tools like Whisper or Google Speech-to-Text.
Cleans and processes the text.
Extracts key discussion points, action items, and decisions.
Summarizes the meeting using deep learning models (T5, BART, PEGASUS).
Outputs structured, human-readable MoMs.
System Components
Audio Input: Upload live or recorded meeting audio.
ASR (Speech-to-Text): Transcription using Whisper or Google ASR.
Text Preprocessing: Cleaning, tokenization, stop-word removal.
Information Extraction: Using POS tagging, NER, TF-IDF/TextRank.
Summarization: With transformer models like BART, T5.
Action Item Detection: Using fine-tuned classifiers (e.g., BERT).
Output Formatting: Exports in DOCX/PDF with clear sections.
Literature Survey Highlights
Janin et al. introduced ICSI Corpus for meeting data.
Zechner (2002) and Murray et al. (2005) explored extractive summarization.
Carletta (2005) introduced AMI Corpus for supervised learning.
Galley (2006) applied discriminative models for key sentence extraction.
Sutskever, Bahdanau: Laid foundations of abstractive summarization using seq2seq and attention models.
Vaswani et al. (2017) introduced the Transformer architecture, which became the backbone of modern NLP models.
System Architecture
Frontend UI: Simple and responsive interface to upload audio, view transcripts, summaries, and download MoMs.
Backend (Flask-based): Manages transcription, summarization, and returns results.
Preprocessing Module: Cleans input data.
Model Layer: Deep learning models trained for summarization and classification.
Storage/Output: Stores and formats the output for user-friendly access.
Implementation Workflow
Collect and prepare data (e.g., AMI, ICSI corpora).
Train models (e.g., logistic regression, BERT) for action item extraction.
Use ASR to convert speech to text.
Summarize and classify using transformer models.
Provide results via web app with RESTful APIs.
User Interface (UI) Design
Clean, minimalist layout with essential features:
Upload audio
Generate minutes
Download report
Mobile/desktop responsive
Real-time feedback via spinners, status bars
Easy navigation for non-technical users
Performance Evaluation
Confusion Matrix: Used to assess accuracy of action item extraction.
Misclassifications mainly due to ambiguous text.
Robustness Check: Validates the reliability of classification under real-world scenarios.
Conclusion
The proposed system automates meeting minutes generation by converting audio to text, extracting key points and action items, and summarizing the discussion using advanced NLP models like BART and T5. It ensures accurate, structured, and readable MoMs with minimal human effort. Future enhancements include multilingual support and improved speaker identification.
References
[1] W., Tong, H., Cao, J., & Zhou, Q. (2023). HVCMM: A Hybrid-View Abstractive Model of Automatic Summary Generation for Teaching. IEEE Transactions on Neural Networks and Learning Systems.Zhang, H., Liu, S., Li, Y., Ren,
[2] Devika, R., Vairavasundaram, S., Mahenthar, C. S. J., Varadarajan, V., & Kotecha, K. (2021). A deep learning model based on BERT and sentence transformer for semantic keyphrase extraction on big social data. IEEE Access, 9, 165252-165261.
[3] Biswas, P. K., & Iakubovich, A. (2022). Extractive summarization of call transcripts. IEEE Access, 10, 119826-119840.
[4] Koay, J. J., Roustai, A., Dai, X., & Liu, F. (2021). A sliding-window approach to automatic creation of meeting minutes. arXiv preprint arXiv:2104.12324
[5] Manuel, M., Menon, A. S., Kallivayalil, A., Isaac, S., & KS, D. L. (2021). Automated generation of meeting minutes using deep learning techniques. International Journal of Computing and Digital System, 109-120.
[6] FM, M. F. A., Pawankumar, S., Guruprasath, M., & Jayaprakash, J. (2022). Automation of Minutes of Meeting (MoM) using NLP. In 2022 International Conference on Communication, Computing, and Internet of Things (IC3IoT) (pp. 1-6). IEEE.
[7] Jung, J., Seo, H., Jung, S., Chung, R., Ryu, H., & Chang, D. S. (2023). Interactive User Interface for Dialogue Summarization. In Proceedings of the 28th International Conference on Intelligent User Interfaces (pp. 934-957).
[8] Liu, H., Liu, H., Wang, X., Shao, W., Wang, X., & Du, Salim, F. D. (2020). Smart meeting: A novel mobile voice meeting minutes generation and analysis system. Mobile Networks and Applications, 25, 521–536.
[9] Gupta, V., & Lehal, G. S. (2010). A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence, 2(3), 258–268.
[10] Zhang, J., Zhao, Y., Saleh, M., & Liu, P. J. (2020). PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization. arXiv preprint arXiv:1912.08777
[11] Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., ... & Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. EMNLP
[12] See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-generator networks. ACL
[13] Cohan, A., Dernoncourt, F., Kim, D. S., Bui, T., Kim, S. N., Chang, W., & Goharian, N. (2018). A discourse-aware attention model for abstractive summarization of long documents.
[14] Narayan, S., Cohen, S. B., & Lapata, M. (2018). Don\'t give me the details, just summary! Topic-aware convolutional neural networks for extreme summarization. EMNLP.