Fast moving and accurate quality assurance procedures are required in the software development world today it helps to make sure the correct functionality of the software as the systems needs keep on changing .Manually writing the test cases based on technical documents can be expensive , error prone and also it can consume lot of time. This work showcases Docling as a test case auto generation and document processing system. It transforms software specifications written as PDFs into structured items which will be ready for testing. Docling has the potential for extracting the text , recognizing metadata and document features , performing embedding-based retrieval and also supporting reasoning with the help of Large Language Models (LLMs). It makes use of LlamaIndex for accurate content lookup with the capacity to connect with other models via the Gemini API . The system makes use of easy web interaction as well as APIs for uploading files, observing the process of document analysis and testing case export. The experimental work indicates substantial savings of human efforts with good applicability and tracing of test cases generated. Docling expands testing efficiency with systematic testing of requirements. It is an effective technique towards automated testing of software quality assurance.
Introduction
The paper presents Docling, an AI-powered automated test case generation system that transforms software requirement documents into structured test cases using document processing, semantic retrieval, and large language models (LLMs). Traditional software testing requires test engineers to manually analyze Software Requirements Specifications (SRS), project documents, and compliance documents to create test cases, a process that is time-consuming, labor-intensive, and prone to inconsistencies. The increasing complexity of modern software systems has created a need for intelligent automation in quality assurance.
Docling addresses this challenge by combining PDF/document extraction, embedding-based semantic retrieval, and LLM-driven reasoning to automatically generate high-quality test cases from documents. The system supports web-based interaction and REST APIs, allowing users to upload documents, monitor processing stages, and download outputs such as feature summaries, markdown reports, and test case datasets.
Key Contributions
Automated generation of software test cases directly from requirement documents.
A hybrid architecture integrating document extraction, semantic search, embeddings, and LLM-based analysis.
Role-optimized prompting strategies for generating structured test artifacts.
An API-enabled interactive interface suitable for software development and QA teams.
Scalable and domain-adaptable architecture requiring minimal re-engineering for new document types.
Literature Review Findings
Previous studies demonstrated the effectiveness of LLMs for automated test generation and requirement analysis. Researchers highlighted benefits such as improved test coverage and contextual understanding, while also identifying challenges including:
Context window limitations.
Hallucination issues in LLM outputs.
Difficulty handling complex and ambiguous requirements.
Lack of traceability between generated test cases and original requirements.
Recent advances in retrieval-augmented generation (RAG), vector databases, and semantic search have shown that combining embeddings with LLMs significantly improves context awareness and reduces hallucinations. Docling builds upon these developments by integrating document understanding, vector retrieval, and test generation into a unified framework.
System Architecture and Workflow
The Docling system consists of four major components:
Frontend Interface – for document upload, monitoring, and result download.
Backend Services – orchestrate document processing and AI workflows.
AI Modules – perform feature extraction and test case generation.
Vector Search Engine – supports semantic retrieval using embeddings.
Document Processing Pipeline
The workflow follows these stages:
Document Upload
Users upload PDF or DOCX files through the web interface or API.
Document Conversion
Documents are converted into structured formats while preserving text, tables, images, and layout.
Text and Image Extraction
Text is extracted and optionally converted into Markdown.
Images are analyzed using AI to generate descriptive metadata.
Feature Extraction
An LLM identifies software requirements, features, and functionalities.
Extracted information is stored in structured JSON format.
Semantic Search and Vector Indexing
Embeddings are generated for document sections.
Similarity search retrieves contextually relevant content using cosine similarity.
Test Case Generation
Retrieved context and prompt templates are used to generate structured test cases.
Generated fields include:
Test Case ID
Title
Description
Preconditions
Test Steps
Expected Results
Priority
Tags
Result Export
Test cases are compiled into formatted Excel files and made available for download.
AI and Machine Learning Components
The system primarily utilizes:
Gemini 1.5 Flash for feature extraction and test case generation.
Optional fallback support for OpenAI models.
Embedding models for semantic indexing and retrieval.
Vector databases through LlamaIndex for efficient document search.
Optimization techniques include:
Embedding caching for faster repeated processing.
Token reduction through semantic retrieval.
API rate limiting and scalable document processing.
Mathematical Model
The framework is based on three core functions:
Feature Extraction
F=fLLM(D)F = f_{LLM}(D)F=fLLM?(D)
where the LLM transforms document content into structured software features.
Embedding Generation
ei=?(ti)e_i = \phi(t_i)ei?=?(ti?)
where text segments are converted into vector representations for semantic indexing.
where cosine similarity measures the relevance between feature queries and document sections.
Conclusion
In our paper, we propose the system called Docling for intelligent document processing as well as test case generation through the power of artificial intelligence. It effectively translates PDFs and word documents into structured form, identifies functional features in the document, and also develops test cases based on the semantic understanding of the document using the power of Large Language Models and semantic vector search.
Google\'s Gemini AI engine for feature extraction as well as image analysis and the powerful LlamaIndex Embeddings for optimal contextual search make Docling highly efficient for test case generation.
Modularity in structure and architecture, along with the responsive interface, efficient backend facilities, and intelligent AI components, is designed within the system to scope out scalability and adaptability capabilities within various document formats.
Future enhancements could also include batch processing, multi-language support, analytics, and distributed processing-continuing performance improvements and enhancing its applicability in a business environment.
References
[1] Generating High?Level Test Cases from Requirements using LLM: An Industry Study — Masuda, S. et al., 2025. This work demonstrates how LLMs can generate high-level test cases automatically from requirement documents with no need for RAG.
[2] A Review of Large Language Models for Automated Test Case Generation — Celik, A. et al., 2025. A survey of approaches using LLMs for test-case generation, summarizing benefits and limitations.
[3] KTester: Leveraging Domain and Testing Knowledge for More Effective LLM?based Test Generation — Li, A. et al., 2025. A framework that combines project-specific structure with testing-domain knowledge to guide LLM-based test generation.
[4] An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation — Schäfer, M., Nadi, S., Eghbali, A., Tip, F., 2023. Empirical results on LLM-based unit test generation across many real-world APIs.
[5] TESTEVAL: Benchmarking Large Language Models for Test Case Generation — Wang, W. et al., 2025. Introduces a benchmark suite to evaluate the test-case generation capability of LLMs by measuring coverage and correctness.
[6] BRMiner: Enhancing Automatic Test Case Generation by Extracting Relevant Inputs from Bug Reports by Ouédraogo, W.C. et al., 2025. This demonstrates how to combine LLMs with classical techniques to extract inputs and improve automated test generation.
[7] A method of test case generation for IoT devices using framework TCG-IoT by Kumar, S. et al. (2025) gives a good example of automated test-case generation within domain-specific context, which is useful for extending the applicability of the approach.
[8] Conversational Text Extraction with Large Language Models Using Retrieval?Augmented Systems by Roy, S. et al., 2025: This presents a pipeline using embeddings and RAG to extract text from PDFs via conversational interface, similar to document-processing tasks in Docling.
[9] Automation of data extraction from scientific literature and reports — Moreira-Filho, J.T. et al., 2025. This demonstrates the automation of document parsing and extraction workflows related to PDF/document ingestion tasks in Docling.
[10] LlamaIndex: Build powerful RAG pipelines (blog) — an official tutorial on how to integrate indexing of vector stores with embeddings and LLMs to enable document retrieval and generation.