From Words to Wonders: AI-Generated Multimedia for Poetry Learning

Authors: Ms. Madhuri Thorat, Neel Kothimbire, Rameshkumar Choudhary, Atharva Jadhav, Priyanshu Kapadnis

DOI Link: https://doi.org/10.22214/ijraset.2025.70946

Abstract

The rise of Generative AI has led to the development of various tools that present new opportunities for businesses and professionals engaged in content creation. The education sector is undergoing a significant transformation in the methods of content development and delivery. AI models and tools facilitate the creation of customized learning materials and effective visuals that enhance and simplify the educational experience. The advent of Large Language Models (LLMs) such as GPT and Text-to-Image models like Stable Diffusion, Flux-Schnell has fundamentally changed and expedited the content generation process. The capability to generate high-quality visuals from textual descriptions has exceeded expectations from just a few years ago. Nevertheless, current research predominantly concentrates on text generation from text, with a notable lack of studies exploring the use of multimodal generation capabilities to tackle critical challenges in instruction supported by multimodal data. In this paper, we propose a framework for generating situational video content based on English poetry, which is executed through several phases: context analysis, prompt generation, image generation, and video synthesis. This comprehensive process necessitates various types of AI models, including text-to-text, text-to-video, text-to-audio, and image-to-image. This project illustrates the potential of combining multiple generative AI models to produce rich multimedia experiences derived from textual content.

Introduction

Overview:

The project "From Words to Wonders" introduces an interactive multimedia educational system that leverages generative AI and other AI techniques to transform poetry learning. It aims to create a dynamic and personalized environment for writing, analyzing, and appreciating English poetry by using rich datasets, AI models, and multimedia tools.

Main Goals:

Enhance poetry education through AI-generated multimedia content (text, images, audio, video).
Assist learners in writing original poetry, analyzing poetic structures, and creating custom learning paths.
Use advanced AI (like OpenAI’s tools and Gemini 2.0 Flash) to support creative engagement and deep interpretation of poetry.

Literature Review Highlights:

Generative AI (GenAI) tools such as ChatGPT, AI poetry books, and AI dramatizations are already being used in education.
AI applications like GPT-4 in Khan Academy and Duolingo show promise in personalizing learning experiences.
Concerns around authorship, creativity, and plagiarism are growing, prompting more research into the ethical use of GenAI in education.

Methodology:

Data Collection:
- Poems are scraped from sources like Poetry Foundation.
- Metadata like title, genre, author, mood, and themes are preserved.
Poem Decomposition:
- Poems are broken down into segments using NLP to extract themes, tone, and implied meaning.
- Gemini 2.0 Flash model is used to analyze structure and semantics.
Sentiment Analysis:
- VADER tool is used to determine emotional tone (positive, negative, neutral) for each poem segment.
- Example: In Wordsworth’s “Daffodils”, initial solitude shifts to joy, shown through sentiment scores.
Image Prompt Generation:
- Emotional tone + poem decomposition are used to create visual prompts.
- Prompts are tailored for both literal and implied meanings using surreal or impressionistic styles.
Image Generation:
- Flux-Schnell text-to-image model produces visuals based on the prompts.
- Images visually express the emotions and themes of the poems.
Audio Generation:
- TTS tools (like Coqui-TTS) create spoken versions of poems with expressive intonation.
- Background soundscapes are added to reflect mood and tone.
Video Generation:
- Combines visuals, narration, and poetic text to create immersive educational videos.
- Designed to deepen engagement and support varied learning styles.

Key Contributions:

A multimodal AI system for poetry learning using text, audio, visuals, and video.
Supports emotional interpretation and aesthetic understanding of poetry.
Offers adaptive learning paths to meet diverse learner needs.
Aligns with current AI trends in education while addressing authorship and ethical concerns.

Conclusion

This research effectively deployed a Generative AI-based system for multimedia poetry learning, with Daffodils as a test case. The combination of audio and video generation provided an immersive interpretive experience, deepening poetic appreciation through synchronized narration, AI-generated visuals, and contextual soundscapes. The audio synthesis enriched the spoken-word representation, while the video aspect visualized the poem\'s themes, emotions, and artistic essence. This strategy showcases the future of AI in making literature an interactive, multisensory learning environment. Continued advancements will center on creating more refined adaptive visualizations, dynamic audioscapes, and customized interactions to further enhance engagement in poetry instruction.

References

[1] J. Chen and D. Wu, “Automatic generation of multimedia teaching materials based on Generative AI: Taking Tang Poetry as an example,” IEEE Trans. Learning Tech., vol. 17, Jan. 2024. [2] H.-j. Lee, “Using AI creations in liberal arts education to teach convergence poetry in practice: Centered on the production of ‘video poetry’,” Scholar. Kyobobook.co.kr, vol. 15, no. 2, pp. 517–558, Apr. 2024 [3] P. Hartley, S. Beckingham, J. Lawrence, and S. Powell, “Using generative AI effectively in higher education,” in Generative AI in Higher Education, Taylor & Francis, 3 Jun. 2024, pp. 1–15. [4] A. J. Han and Z. Cai, “Design implications of generative AI systems for visual storytelling for young learners,” Assoc. for Comput. Machinery, New York, NY, USA, 19 Jun. 2023. [5] S. Bender, “Coexistence and creativity: Screen media education in the age of artificial intelligence content generators,” Taylor & Francis Online, vol. 24, pp. 35–47, May 2023. [6] L. Kumar, D. K. Singh, and M. A. Ansari, “Role of video content generation in education systems using generative AI,” IGI Global, 3 Jun. 2024, doi:10.4018/979-8-3693-2440-0.ch019. [7] Bozkurt, “Generative AI, synthetic contents, open educational resources (OER), and open educational practices (OEP): A new front in the openness landscape,” Open Praxis, vol. 15, no. 1, pp. 1–12, Jan. 2023. [8] D. Leiker and M. Cukurova, “Generative AI for learning: Investigating the potential of synthetic learning videos,” in Proc. SpringerLink Conf., Apr. 2023, pp. 523–529. [9] Jon McCormack, Elliott Wilson, Nina Rajcic, Maria Teresa Llano , “ Mimetic Poet” arXiv.orgarXiv:2407.11984 , 04 Jun 2024 [10] Vishnu P. Nambiar ,” Enriching Education with Artificial Intelligence Generative Ai-Speech to Image Generator”, (International Journal for Research in Applied Science and Engineering Technology (IJRASET)) - Vol. 12, Iss: 5, pp 4557-4562 , 31 May 2024 [11] Grace Enriquez,Victoria Gill, Gerald Campano,Tracey T. Flores,Stephanie Jones, Kevin M. Leander, Lucinda McKnight, Detra Price-Dennis, “: Generative AI and composing: an intergenerational conversation among literacy scholars” English Teaching: Practice & Critique, Vol.23 No. 1, pp. 6-22. , 22 Dec 2023 [12] S. R. Yazid, MutmainnahMustofa, UlilFitriyah , “Can automatic poetry generation infuse values? unveiling insights through content analysis of generated poetry” , https://doi.org/10.18860/ling.v19i1.25482, 06 Aug 2024 [13] James L. Jr. Hutson, Ana M. Schnellmann ,” The Poetry of Prompts: The Collaborative Role of Generative Artificial Intelligence in the Creation of Poetry and the Anxiety of Machine Influence”, Global Journal of Computer Science and Technology, 23(D1), 1–14,10 April 2023 [14] Juan Ernesto Perez Perez ,” The application of Gen-AI and creativity in the context of public education in frontier environments” , Journal of enabling technologies ISSN: 2398-6263 , 13 Aug 2024 [15] Stevenson, R. A., Zemtsov, R. K., & Wallace, M. T. (2015). Individual differences in the multisensory temporal binding window predict susceptibility to audiovisual illusions. PMC, 4451240.

Copyright

Copyright © 2025 Ms. Madhuri Thorat, Neel Kothimbire, Rameshkumar Choudhary, Atharva Jadhav, Priyanshu Kapadnis. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET70946

Publish Date : 2025-05-13

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here