This paper presents NeuroSketch, a framework that automatically converts continuous spoken discourse into structured, hierarchical mind maps — requiring no manual intervention from the user. Audio is captured via microphone, converted to text through OpenAI Whisper, and subsequently analyzed by an NLP module responsible for identifying core concepts along with the semantic connections among them. Topic segmentation and keyword identification are handled by a Transformer architecture, while a Graph Attention Network (GAT) is employed to determine parent-child relationships and construct a well-organized hierarchy. The resulting visualization appears as an animated, interactive mind map rendered through a React.js and D3.js web interface. Evaluation results show that NeuroSketch attained 93.7% accuracy and a 92.1% F1-score on the concept extraction task, outperforming all comparison models, while sustaining end-to-end processing latency below two seconds. A usability evaluation with 30 volunteers established high satisfaction scores and revealed a 63% average decrease in the cognitive overhead of manual note-taking.
Introduction
NeuroSketch is an AI-powered system that automatically converts live speech into interactive mind maps in real time. Traditional mind map creation requires users to listen, organize, and structure information simultaneously, which is cognitively demanding and often results in incomplete notes. NeuroSketch addresses this challenge by combining Automatic Speech Recognition (ASR), Natural Language Processing (NLP), Transformers, and Graph Neural Networks (GNNs) to generate structured visual representations directly from spoken input.
The system operates through four stages: speech transcription using OpenAI Whisper, concept extraction and linguistic analysis using NLP techniques, semantic grouping and hierarchy generation through Transformer models and Graph Attention Networks (GATs), and interactive visualization using React.js and D3.js. This creates a dynamic mind map that updates continuously as the speaker talks.
Key contributions include an end-to-end speech-to-mind-map pipeline, a Transformer-based topic segmentation model, a GAT-based hierarchy inference mechanism, an interactive browser interface with editing and export features, and the introduction of MindMap-Bench, a dataset containing 500 speech-to-mind-map examples. Unlike previous systems that focused on static documents, slide decks, or text-based inputs, NeuroSketch uniquely integrates real-time speech processing, concept extraction, hierarchy construction, and visualization within a single framework.
The system was trained and evaluated using datasets such as LibriSpeech, CNN/DailyMail, a custom lecture corpus, and MindMap-Bench. Audio preprocessing included noise reduction and normalization, while NLP processing utilized SpaCy, AllenNLP, Sentence-BERT, and KeyBERT for concept extraction, coreference resolution, and semantic similarity analysis.
Experimental results demonstrated superior performance compared to baseline methods. NeuroSketch achieved 93.7% accuracy and 92.1% F1-score in concept extraction, outperforming TF-IDF, TextRank, BERT-NER, and GPT-2-based approaches. The system maintained an average end-to-end latency of 1.9 seconds, meeting real-time requirements. User studies involving students and professionals reported a 63% reduction in note-taking effort, with high ratings for usability (4.5/5) and mind map coherence (4.3/5).
An ablation study confirmed that semantic clustering, GNN-based hierarchy inference, and coreference resolution significantly contribute to performance. The final system provides real-time visualization, node editing, drag-and-drop interaction, and export options including PNG, PDF, JSON, and Markdown formats.
Conclusion
This paper has presented NeuroSketch — a system that bridges the gap between live speech and structured knowledge visualization by converting spoken input into interactive, hierarchical mind maps. The architecture combines OpenAI Whisper for transcription, a BERT-based Transformer for semantic segmentation, a Graph Attention Network for hierarchy inference, and a React.js/D3.js front end for rendering. The result is a system achieving 93.7% concept extraction accuracy and a 92.1% F1-score, with an end-to-end processing latency of 1.9 seconds. Participant feedback from the usability study was strongly positive, with all volunteers reporting substantial reductions in note-taking burden. On a broader level, NeuroSketch represents a meaningful step toward automating the capture and organization of spoken knowledge — a capability with tangible value across education, professional meetings, and collaborative knowledge work.
References
[1] Y. Wen, Z. Wang, and J. Sun, \"MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models,\" arXiv preprint, 2023.
[2] A. Sharma et al., \"Structsum: Generation for Faster Text Comprehension,\" arXiv preprint, 2024.
[3] S. Li, H. Zhang, and W. Chen, \"Presentation Mining Framework,\" in Proceedings of SCDM, 2024.
[4] R. Patel and M. Joshi, \"PDF2MindMap: AI-Based Interactive Mind Map Generation,\" IJSRET, vol. 14, no. 2, 2025.
[5] K. Verma and A. Singh, \"Audio/Speech to Interactive Mind Map,\" in Proceedings of International Conference on Intelligent Systems, 2024.
[6] J. Devlin et al., \"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,\" in Proceedings of NAACL, 2019.
[7] A. Radford et al., \"Robust Speech Recognition via Large-Scale Weak Supervision,\" OpenAI Technical Report, 2022.
[8] N. Reimers and I. Gurevych, \"Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,\" in Proceedings of EMNLP, 2019.
[9] P. Velickovic et al., \"Graph Attention Networks,\" in Proceedings of ICLR, 2018.
[10] M. Grootendorst, \"KeyBERT: Minimal Keyword Extraction with BERT,\" Zenodo, 2020.