Automateduser-interface (UI)testingisessentialfor continuoussoftware delivery, yetitremains fragile in the face of frequent front-end changes. Conventional automation frame- works (Selenium, Cypress, Playwright) depend on static locators and basic assertion primitives that are prone to breakage and produce flaky results. Recent research and commercial developments propose AI-enabled strategies to reduce fragility: self- healing locators, perceptual visual checks, and intelligent reporting.This review synthesizes literature from 2018– 2025, identifies key deficiencies in existing techniques, and articulates a unified architecture that integrates(a) hybrid AI-driven locator healing, (b) semantic Visual Proof Validation (VPV) inspired by Vision Transformer architectures, and (c) dual-format AI reporting with automated rerun orchestration.We present comparative analysis, identify research gaps, and discuss prototype results demonstrating improved healing accuracy and actionable test intelligence.
Introduction
Automated User Interface (UI) testing is a critical component of modern software development, particularly within Agile and CI/CD pipelines, ensuring rapid releases without compromising functional and visual integrity. However, test fragility caused by evolving front-end frameworks, dynamic DOM structures, and CSS changes results in high maintenance costs and unstable test suites. Traditional automation tools like Selenium and Cypress provide robust scripting but lack adaptability against such dynamic changes.
Recent advancements in Artificial Intelligence (AI), Deep Learning (DL), and Computer Vision (CV) have introduced self-healing capabilities that automatically repair broken locators and perform semantic-level visual verification. Vision Transformers (ViT) and contrastive learning now enable context-aware visual comparisons, surpassing traditional pixel-diff approaches. Similarly, Large Language Models (LLMs) are being explored for automated test generation and failure explanation, although their integration with runtime healing and visual proofing remains limited.
The literature from 2018–2025 shows a transition from rule-based heuristics to hybrid AI-driven frameworks combining deep embeddings, semantic validation, and adaptive learning. Despite this progress, major research gaps persist:
Limited integration of semantic visual verification with DOM-based healing.
Lack of continuous learning pipelines to adapt models as interfaces evolve.
Poor integration between test generation, runtime healing, and visual validation.
Deficient explainability and auditability of AI-driven healing decisions.
Challenges in cross-platform generalization across web and mobile UIs.
High computational costs hindering scalability in CI/CD pipelines.
Unaddressed security and privacy concerns related to captured UI data.
Future research emphasizes hybrid multimodal frameworks that unify DOM, vision, and language cues; online and active learning pipelines; and explainable, resource-efficient, privacy-preserving architectures.
The proposed AI-driven framework aims to integrate locator self-healing with semantic Visual Proof Validation (VPV) through a six-stage loop—covering test execution, healing, visual verification, and reporting. This closed-loop architecture enhances resilience, transparency, and automation reliability, paving the way for intelligent, self-adaptive, and human-comprehensible UI testing systems capable of supporting modern continuous delivery environments.
Conclusion
This review and prototype study argues that the next generation of UI test automation should transcend brittle locatorsandnaivevisualdiffsbyintegratinghybridAIhealing,semanticvisualproof,andactionableAIreporting.The combined approach yields substantial improvements in healing accuracy and maintenance efficiency while providing the explainability necessary for human trust and governance.