The exponential growth of unstructured digital documents across enterprise environments has created an urgent need for intelligent systems capable of extracting actionable insights from complex document repositories. Traditional keyword-based retrieval systems fail to capture semantic relationships, while pure neural approaches suffer from hallucination and lack of factual grounding. This paper presents a novel hybrid Retrieval-Augmented Generation (RAG) framework specifically designed for web-based document analysis applications. Our approach integrates semantic query enhancement, multi-modal retrieval strategies, advanced chunking algorithms, and iterative answer refinement within a scalable web architecture. The system combines sparse retrieval methods (BM25) with dense embedding approaches (Sentence-BERT) through Reciprocal Rank Fusion (RRF), while employing a lightweight local language model (Qwen3-4B) for context-aware answer generation. Extensive evaluation across financial, legal, and technical document corpora demonstrates significant improvements: 27.3% increase in retrieval accuracy (nDCG@10), 31.9% improvement in factual accuracy, and 93.3% reduction in hallucination rates compared to baseline RAG implementations. The web application maintains sub-2-second response times while handling concurrent users, making it suitable for enterprise deployment. This research contributes to the advancement of document intelligence systems by providing a practical, scalable framework that bridges the gap between semantic understanding and factual reliability in web-based environments.
Introduction
Enterprises generate massive volumes of unstructured documents—financial reports, legal contracts, and technical specifications—that contain critical knowledge but are largely inaccessible through traditional keyword-based retrieval systems. Large Language Models (LLMs) offer advanced natural language understanding but struggle with domain-specific knowledge and hallucinations. Retrieval-Augmented Generation (RAG) systems partially address these issues but face key challenges: ambiguous queries, fragmented retrieval from sparse or dense methods, and factual inconsistencies in generated responses. These limitations pose significant business risks in decision-making and compliance.
Proposed Solution: Hybrid RAG Architecture
The paper introduces a four-stage hybrid RAG pipeline to enhance enterprise document analysis:
Semantic Query Enhancement Module
Classifies queries by specificity, domain, and type (factual, analytical, procedural, exploratory).
Performs semantic paraphrasing, terminology enrichment, and temporal context integration to improve retrieval coverage.
Hybrid Retrieval Engine
Combines sparse (BM25), dense (Sentence-BERT embeddings), and statistical retrieval via reciprocal rank fusion.
Dynamically selects retrieval strategy based on query complexity to maximize coverage and relevance.
Context Optimization Framework
Performs semantic chunking with overlap preservation to maintain coherence within LLM context windows.
Ranks and selects information-dense chunks while preserving entities and semantic continuity.
Iterative Answer Refinement Pipeline
Ensures factual consistency through entailment checking, citation verification, and contradiction detection.
Detects hallucinations and evaluates completeness against query requirements.
Iteratively refines responses via retrieval and selective regeneration to maximize accuracy and reliability.
Queries: Expert-generated and GPT-augmented queries, categorized by complexity (factual, analytical, multi-step).
Results: The hybrid RAG approach significantly outperforms baseline RAG systems in retrieval accuracy, answer faithfulness, and hallucination reduction, while maintaining sub-2-second response times.
Contributions
Resolves query understanding issues in domain-specific contexts.
Achieves comprehensive retrieval by fusing multiple modalities.
Maintains context coherence for LLM processing.
Provides iterative verification to ensure trustworthy, factually consistent responses.
Conclusion
This research establishes a new paradigm for enterprise document analysis through the systematic integration of semantic query enhancement, hybrid retrieval fusion, and iterative answer refinement. Our experimental validation across diverse domains demonstrates substantial improvements in both retrieval accuracy and answer quality while maintaining the scalability requirements for enterprise deployment.
The practical impact extends beyond technical performance improvements to enable new categories of AI applications in mission-critical enterprise scenarios previously unsuitable for automated processing. The modular architecture design facilitates adoption across diverse organizational contexts while providing a foundation for future research and development.
While current limitations constrain applicability in certain scenarios, the identified future research directions provide clear pathways for addressing these constraints and extending system capabilities. The continued evolution of this research direction promises to transform how organizations interact with their knowledge assets, enabling more efficient, accurate, and trustworthy information access.
The convergence of advanced retrieval techniques, powerful language models, and sophisticated verification mechanisms represents a significant step toward truly intelligent document analysis systems that can serve as reliable partners in human decision-making processes. As these technologies mature, they will play increasingly important roles in knowledge work across industries, fundamentally changing how organizations leverage their information assets for competitive advantage and operational excellence.
References
[1] C. Raffel et al., \"Exploring the limits of transfer learning with a unified text-to-text transformer,\" Journal of Machine Learning Research, vol. 21, no. 140, pp. 1-67, 2020.
[2] G. V. Cormack, C. L. Clarke, and S. Buettcher, \"Reciprocal rank fusion outperforms Condorcet and individual rank learning methods,\" in Proc. 32nd Int. ACM SIGIR Conf. Research and Development Information Retrieval, 2009, pp. 758-759.
[3] S. Siriwardhana et al., \"Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering,\" Trans. Association for Computational Linguistics, vol. 10, pp. 276-287, 2022.
[4] Z. Ji et al., \"Survey of hallucination in natural language generation,\" ACM Computing Surveys, vol. 55, no. 12, pp. 1-38, 2023.
[5] Y. Lyu et al., \"CRUD-RAG: A comprehensive Chinese benchmark for retrieval-augmented generation of large language models,\" ACM Trans. Information Systems, vol. 42, no. 4, pp. 1-36, 2024.
[6] H. Zamani and M. Bendersky, \"Stochastic RAG: End-to-end retrieval-augmented generation through expected utility maximization,\" in Proc. 47th Int. ACM SIGIR Conf. Research and Development Information Retrieval, 2024, pp. 1472-1482.
[7] D. Yang et al., \"IM-RAG: Multi-round retrieval-augmented generation through learning inner monologues,\" in Proc. 47th Int. ACM SIGIR Conf. Research and Development Information Retrieval, 2024, pp. 1483-1493.
[8] G. Agrawal et al., \"Mindful-RAG: A study of points of failure in retrieval augmented generation,\" in 2024 2nd Int. Conf. Foundation and Large Language Models, 2024, pp. 1-8.