Modern marketing depends extensively on digital advertising, although producing high-quality advertising content often requires a lot of time and skill. The AI-Powered Hybrid Framework for Product Advertisement Generation proposed in this study automatically converts product photos into Advertisement visuals and promotional videos. The proposed framework combines a Flask-based web application, Stable Diffusion XL (SDXL) Inpainting, prompt enhancement, WAN 2.2 image-to-video production and image preprocessing into a single workflow. WAN 2.2 uses the visually appealing commercial images produced by SDXL to create dynamic promotional videos. The interactive platform provided by the Flask-based web application lets users create promotional videos and visuals for advertisements, upload product images, and manage the final output. The generated outputs are evaluated using CLIP Score, SSIM, LPIPS, HPS v2, and Temporal Consistency metrics. Experimental findings show that the system produces high-quality ad content while maintaining video integrity and product characteristics. The performance of the suggested technique for producing realistic and marketing-focused advertising content is further validated by comparative analysis. For AI-driven advertisement generation in digital marketing and e-commerce applications, the suggested system offers a scalable and effective solution.
Introduction
The text presents an AI-based system for automated product advertisement generation using a hybrid image and video synthesis pipeline. It addresses the growing need in digital marketing and e-commerce for high-quality promotional content while reducing the manual effort involved in traditional advertisement creation.
The proposed framework takes a single product image as input and generates both advertisement images and promotional videos. It includes image preprocessing (background removal, masking), prompt enhancement for better text descriptions, SDXL Inpainting for generating realistic and visually appealing ad images, and WAN 2.2 for converting these images into dynamic videos. The system is deployed through a Flask web application for easy user access. Quality is evaluated using metrics such as CLIP Score, SSIM, LPIPS, HPS v2, and temporal consistency.
Existing literature shows that while generative AI is effective for creating images and videos separately, most approaches lack an integrated pipeline for end-to-end advertisement generation, especially combining both image and video creation with product consistency and visual quality.
Conclusion
Developing an AI-powered hybrid framework for automated product advertisement generation was the main goal of this study. The suggested solution combines WAN 2.2 for promotional video synthesis, SDXL Inpainting for creating advertising images, Gemini Flash prompt enhancement, and picture preprocessing into a single workflow. The framework successfully converts common product photos into eye-catching commercial images and captivating promotional videos while maintaining key product attributes, according to the results. The capacity of the suggested method to produce high-quality and semantically relevant advertising content is confirmed by quantitative evaluation utilizing CLIP Score, SSIM, LPIPS, HPS v2, and Temporal Consistency.
With all factors assessed, the suggested framework offers an effective, scalable, and useful solution for AI-driven advertisement production, decreasing the quantity of manual design work while assisting with digital marketing, social media promotion, and e-commerce applications.
References
[1] He, Y., Yang, S., Zhang, H., Wang, J., & Chen, W., RefAdGen: High-Fidelity Advertising Image Generation, pp. 1–8, 2024,
https://doi.org/10.48550/arXiv.2508.11695
[2] Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., & Lorenz, D., Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets, arXiv preprint arXiv:2311.15127, 2023,https://arxiv.org/abs/2311.15127
[3] Du, Z., Feng, W., Wang, H., Li, Y., Wang, J., Li, J., Zhang, Z., Lv, J., Zhu, X., Jin, J., Shen, J., Lin, Z., Shao, J., “Towards Reliable Advertising Image Generation Using Human Feedback”, Computer Vision – ECCV 2024, Lecture Notes in Computer Science (LNCS), Vol. 15078, pp. 399–415, Springer, Cham, 2025, https://doi.org/10.1007/978-3-031-72661-3_23
[4] A. A. Rokhade, A. Deivam, A. J. V. Shettigar, T. M. and V. R. B. Prasad, \"Intelligent Advertisement Generation: Harnessing Deep Learning Techniques,\" Proceedings of the 8th International Conference on Intelligent Computing and Control Systems (ICICCS), IEEE,pp.1127–1132,2024,ISSN:2694-2090, https://doi.org/10.1109/ICAAIC60222.2024.10575640
[5] J. Hartmann, Y. Exner, and S. Domdey, \"The power of generative marketing: Can generative AI create superhuman visual marketing content?,\" International Journal of Research in Marketing, vol. 42, no. 3, pp. 245–263, 2025, Elsevier, ISSN: 0167-8116, https://doi.org/10.1016/j.ijresmar.2024.09.002
[6] Zheng, X., Qiao, X., Cao, Y., and Lau, R. W. H. “Content-Aware Generative Modeling of Graphic Design Layouts.” ACM Transactions on Graphics (TOG), vol. 38, no. 4, Article 133, pp. 1–15, July 2019. ISSN: 0730-0301, https://doi.org/10.1145/3306346.3322971
[7] Liu, K., Li, W., Yang, C., and Yang, G. “Intelligent Design of Multimedia Content in Alibaba.” Frontiers of Information Technology & Electronic Engineering, vol. 20, no. 12, pp. 1657–1668, 2019. Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature. ISSN 2095-9184 (print), ISSN 2095-9230 (online), https://doi.org/10.1631/FITEE.1900580
[8] Blattmann, A., Rombach, R., Ling, H., Dockhorn, T., Kim, S. W., Fidler, S., and Kreis, K., \"Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models,\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22563–22575, 2023, https://doi.org/10.1109/CVPR52729.2023.02161
[9] Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, Qifeng Chen, \"Latent Video Diffusion Models for High-Fidelity Long Video Generation,\" arXiv preprint arXiv:2211.13221, 2023, https://doi.org/10.48550/arXiv.2211.13221
[10] Yu, S., Sohn, K., Kim, S., and Shin, J., \"Video Probabilistic Diffusion Models in Projected Latent Space,\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18456–18466, 2023, https://doi.org/10.1109/CVPR52729.2023.01770