Survey on AI-based Recipe Recommendation, Retrieval, and Generation Systems

Authors: Ms. Happy A, Dr. S Gunasekaran, Navaneeth M, Nafis Nabil N S, E A Aadith Kumar , Didhul D N

DOI Link: https://doi.org/10.22214/ijraset.2025.76350

Abstract

Artificial Intelligence (AI) has revolutionised the culinary domain by enabling intelligent systems that can recommend, retrieve, and develop recipes through many modalities. This survey analyses four pivotal research contributions that collectively demonstrate the progress of AI-driven recipe modelling: Video-based Recipe Retrieval by Cao et al. (2020), KitchenScale by Choi et al. (2023), Intelligent Food Planning by Freyne and Berkovsky (2010), and Learning Structural Representations for Recipe Generation and Food Retrieval by Wang et al. (2021). Each study investigates distinct yet interconnected aspects of intelligent food systems, including personalised recommendations, cross-modal video retrieval, ingredient amount predictions, and structural recipe creation. This survey shows how AI strategies, from simple recommender algorithms to more complex deep learning and transformer-based architectures, have made it easier to understand and automate recipe-related tasks by comparing their methods, datasets, and results. The findings underscore the growing imperative for integrated, multimodal frameworks that amalgamate customisation, semantic reasoning, and visual comprehension to enable next-generation AI-driven culinary applications.

Introduction

The text reviews the evolution of AI in culinary applications, highlighting how artificial intelligence has transformed recipe discovery, understanding, and generation. Early recipe systems relied on static databases and rule-based searches, which lacked awareness of user preferences, dietary goals, and ingredient interactions. Advances in machine learning, natural language processing (NLP), and computer vision have enabled AI systems to process complex multimodal data—including text, images, audio, and video—allowing for personalized recommendations, dynamic recipe adjustments, and structured recipe creation from visual inputs.

Key research contributions over the past decade include:

Freyne & Berkovsky (2010): Developed personalized recipe recommendation systems integrating nutritional and health data to promote sustainable eating habits.
Cao et al. (2020): Addressed cross-modal recipe retrieval by aligning textual instructions with cooking video segments using hierarchical attention and reinforcement learning.
Wang et al. (2021): Proposed cross-modal recipe generation, producing structured recipes from single food images using hierarchical structure learning.
Choi et al. (2023) – KitchenScale: Focused on predicting ingredient quantities and units using transformer-based models for fine-grained semantic understanding.

Building on these, RecipeScaler (2025) integrates retrieval, generation, scaling, translation, and personalization into a single multimodal system. It extracts recipes from YouTube videos, parses ingredients and steps using NLP, scales ingredient quantities dynamically, translates recipes into multiple languages, and offers generative and interactive guidance. RecipeScaler leverages datasets like Recipe1M+, YouCook2, KitchenScale Corpus, and its own multimodal RecipeScaler dataset, which combines speech, video, and text inputs to provide holistic, real-world recipe understanding.

The paper also emphasizes the challenges of evaluating AI-based culinary systems due to their multimodal, cultural, and subjective nature. Traditional metrics like BLEU, ROUGE, or Precision@K fail to capture logical coherence, ingredient compatibility, and procedural validity. Integrated evaluation frameworks are therefore necessary to assess semantic fidelity, usability, and cultural relevance.

Conclusion

This survey has examined the progression of AI-driven recipe recommendation, retrieval, and generation systems, charting their advancement from initial personalization models to contemporary multimodal and generative methodologies. A comparative analysis of five seminal works—Freyne and Berkovsky [4], Cao et al. [1], Wang et al. [2], Choi et al. [3], and RecipeScaler [5]—demonstrates that the domain of food computing has evolved from fragmented, task-oriented systems into a comprehensive field that amalgamates machine learning, natural language processing, computer vision, and user modeling. The first systems predominantly employed rule-based personalization, where algorithms recommended recipes based on user preferences and health goals [4]. Advances in deep learning have since enabled AI models to understand relationships among heterogeneous data types such as text, images, and procedural steps. Models such as those by Cao et al. [1] and Wang et al. [2] propelled the field forward through the introduction of cross-modal learning, bridging visual and textual modalities for recipe retrieval and generation. Choi et al.’s KitchenScale [3] introduced a fine-grained layer of “ingredient quantity prediction,” linking linguistic comprehension with numerical reasoning. RecipeScaler [5] represents the culmination of this trajectory—integrating speech recognition, natural language processing (NLP), and generative AI into a unified framework that encompasses recommendation, retrieval, and generation. The system’s capability to extract, scale, translate, and personalize recipes from YouTube videos exemplifies how academic innovation can evolve into a practical, user-centered platform. This integration aligns with the broader trend in AI research toward adaptive, multimodal systems that emulate human perception and cognition [8], [9]. As the field advances, future research will likely emphasize holistic, cross-disciplinary methodologies connecting AI, nutrition science, and human–computer interaction. Systems inspired by RecipeScaler will extend beyond static data retrieval to include interactive, conversational agents capable of dynamic planning, adaptation, and instruction. In conclusion, the comparative findings delineate a clear technological evolution—from personalization to retrieval, from retrieval to generation, and finally to integration and interaction. The next generation of intelligent kitchen ecosystems will not merely recommend what to cook but will understand how, why, and for whom the meal is prepared, positioning AI as an active collaborator in the cooking process [1]–[5].

References

[1] D. Cao, Y. Li, and J. Xu, “Video-based recipe retrieval using hierarchical attention and reinforcement learning,” Information Sciences, vol. 514, pp. 302–318, 2020. [2] X. Wang, Y. Zhang, and T. Mei, “Learning structural representations for recipe generation and food retrieval,” IEEE Transactions on Multimedia, vol. 24, no. 6, pp. 1672–1685, 2021. [3] D. Choi, H. Kim, and J. Park, “KitchenScale: Predicting ingredient quantities using transformer-based language models,” Expert Systems with Applications, vol. 223, 2023. [4] J. Freyne and S. Berkovsky, “Intelligent food planning: Personalized recipe recommendation for health and wellbeing,” Proceedings of the 15th International Conference on Intelligent User Interfaces (IUI), pp. 321–324, 2010. [5] N. M. Navaneeth, N. N. N. S. Nafis, E. A. A. Kumar, D. N. Didhul, and H. A. Ms., “RecipeScaler: An AI-based multimodal system for recipe extraction, scaling, and personalization,” Unpublished Project Report, Ahalia School of Engineering and Technology, Kerala, India, 2025. [6] J. Marin, M. M. Proenca, and F. P. de la Torre, “Recipe1M+: A dataset for learning cross-modal embeddings for cooking recipes and food images,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1191–1200, 2019. [7] M. Zhou, L. Y. Yao, and T. Mei, “YouCook2: A large-scale instructional video dataset for cooking,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1243–1252, 2018. [8] A. Vaswani et al., “Attention is all you need,” Advances in Neural Information Processing Systems (NeurIPS), vol. 30, pp. 5998–6008, 2017. [9] T. Brown et al., “Language models are few-shot learners,” Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 1877–1901, 2020. [10] S. Floridi and J. Cowls, “A unified framework of five principles for AI in society,” Harvard Data Science Review, vol. 1, no. 1, pp. 1–15, 2019. doi: 10. [11] M. Jobin, M. Ienca, and E. Vayena, “The global landscape of AI ethics guidelines,” Nature Machine Intelligence, vol. 1, pp. 389–399, 2019. [12] IEEE, Ethically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems, 1st ed., The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems, 2019. [Online]. [13] IEEE Standards Association, IEEE 7010–2020: Recommended Practice for Assessing the Impact of Autonomous and Intelligent Systems on Human Well-Being, 2020. [Online]. [14] ISO/IEC JTC 1/SC 42, ISO/IEC 23894:2023—Information Technology—Artificial Intelligence—Risk Management, International Organization for Standardization, Geneva, 2023. [15] B. Mittelstadt, P. Allo, M. Taddeo, S. Wachter, and L. Floridi, “The ethics of algorithms: Mapping the debate,” Big Data & Society, vol. 3, no. 2, pp. 1–21, 2016.

Copyright

Copyright © 2025 Ms. Happy A, Dr. S Gunasekaran, Navaneeth M, Nafis Nabil N S, E A Aadith Kumar , Didhul D N. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET76350

Publish Date : 2025-12-13

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here