A Comparative Study of Traditional MongoDB Search and Vector-Based Semantic Retrieval

Authors: Akshay Lokare , Indrajeet Kedari , Smita Patil

DOI Link: https://doi.org/10.22214/ijraset.2025.75892

Abstract

With the exponential growth of unstructured textual data, the limitations of traditional keyword-based database querying have become increasingly evident. MongoDB, a widely used document-oriented database, primarily depends on lexical search operators such as $text and $regex, which often fail to capture semantic meaning or contextual relevance. This study aims to empirically evaluate the effectiveness of MongoDB Atlas Vector Search, a vector-based semantic retrieval system, in overcoming these limitations. The research conducts a comparative experimental analysis between traditional MongoDB keyword search and MongoDB Atlas Vector Search to assess improvements in semantic relevance, latency performance, and hybrid filtering efficiency. Using datasets ranging from 100,000 to one million documents embedded through OpenAI and Sentence Transformer models, the experiments demonstrate a three- to fourfold increase in semantic retrieval accuracy while maintaining sub-100 millisecond latency suitable for real-time applications. Although vector indexing introduces moderate storage and computational overhead, these trade-offs are offset by significant gains in contextual understanding and intelligent retrieval capability. The study’s findings confirm that MongoDB Atlas Vector Search effectively bridges the gap between traditional keyword-based querying and AI-driven semantic search, marking a substantial advancement in modern database technology. Overall, the research contributes quantitative evidence supporting the transition toward meaning-aware, context-driven data retrieval systems for scalable enterprise applications.

Introduction

The text examines the fundamental limitations of traditional MongoDB keyword-based search—which relies on lexical pattern matching—and contrasts it with the emerging capabilities of MongoDB Atlas Vector Search. Traditional queries ($text, $regex) efficiently support structured and exact term lookups but lack semantic understanding, making them ineffective for retrieving conceptually related documents when vocabulary differs. In response, vector databases and embedding-based retrieval systems represent data as high-dimensional vectors that capture contextual meaning, enabling more intelligent and semantically aware search.

This study conducts a controlled experimental comparison between classical MongoDB search and its vector-enabled counterpart. Using large-scale datasets (100k–1M documents), embeddings generated from modern models, and both lexical and vector indexes, the methodology evaluates semantic relevance, recall, latency, scalability, and hybrid filtering performance. The research design involves running repeated keyword, vector, and hybrid queries and logging performance metrics such as latency, Recall@K, throughput, and index overhead.

Results show that vector search offers 3–4× higher semantic accuracy and dramatically improves conceptual retrieval compared to traditional keyword search. Although vector queries introduce higher latency (approximately 80–100 ms vs. 21–28 ms), overall performance remains within practical real-time thresholds. Hybrid search—combining structured filtering with semantic ranking—delivers the strongest balance of precision, contextual relevance, and operational control. Vector indexing increases storage requirements due to embeddings, and scalability at the million-document scale requires ANN tuning.

The study concludes that MongoDB Atlas Vector Search marks a significant shift from syntax-based retrieval toward meaning-based search architectures. By unifying traditional document querying with semantic intelligence, it provides a practical and scalable solution for modern applications requiring context-aware information retrieval.

Conclusion

This study confirms that MongoDB Atlas Vector Search provides a major improvement in semantic retrieval compared to traditional MongoDB keyword queries, achieving a three to four times increase in Recall@10 while maintaining response times under 100 milliseconds on large, real-world datasets. These results show that Atlas Vector Search is not just a research experiment but a reliable, production-ready solution for semantic retrieval. The findings also show that combining structured filters with semantic similarity, known as hybrid querying, offers the best balance between precision, relevance, and scalability. Although vector indexing requires more storage and slightly increases latency, these trade-offs are outweighed by the significant improvement in contextual understanding and retrieval intelligence. This study highlights the broader shift from traditional keyword-based systems to semantic, context-aware search methods that use vector representations. While keyword searches work well for exact matches, they lack the depth needed to interpret natural language and complex meanings. In contrast, vector search captures contextual nuances through high-dimensional embeddings, resulting in better recall, accuracy, and user experience. Overall, MongoDB Atlas Vector Search bridges the gap between traditional querying and AI-powered semantic retrieval, paving the way for smarter, more intuitive data systems. Future research should explore hybrid indexing strategies, multimodal embeddings such as text, image, and audio, as well as advanced optimization and real-time re-ranking with feedback from large language models to improve scalability, precision, and adaptability in enterprise-scale semantic search.

References

[1] Drummond, C. (2015). Quantitative Research Designs: Experimental, Quasi-Experimental, and Descriptive. Jones & Bartlett Learning. [2] AIIMS Rishikesh. (2019). Preparing Research Design: Quantitative Research Design. https://aiimsrishikesh.edu.in/ [3] MongoDB. (2024). Atlas Vector Search: Semantic Search for Modern Applications. https://www.mongodb.com/products/platform/atlas-vector-search [4] MongoDB. (2025). New Benchmark Tests Reveal Key Vector Search Performance Factors. https://www.mongodb.com/company/blog/innovation/new-benchmark-tests-reveal-key-vector-search-performance-factors [5] Weaviate. (2024). Evaluation Metrics for Search and Recommendation Systems: Recall@K and Precision. https://weaviate.io/blog/retrieval-evaluation-metrics [6] MongoDB. (2025). MongoDB Text Search and Regex Query Operators. https://www.mongodb.com/docs/manual/reference/operator/query/regex/ [7] Microsoft Learn. (2025). Hybrid Search Using Vectors and Full Text in Azure AI Search. https://learn.microsoft.com/en-us/azure/search/hybrid-search-overview [8] OpenAI Cookbook. (2025). Embedding Wikipedia Articles for Search. https://cookbook.openai.com/examples/embedding_wikipedia_articles_for_search [9] Hugging Face. (2025). Sentence Transformers: State-of-the-Art Text and Image Embeddings. https://huggingface.co/sentence-transformers [10] Elastic. (2025). How to Choose the Best k and num_candidates for kNN Search. https://www.elastic.co/search-labs/blog/elasticsearch-knn-and-num-candidates-strategies [11] SeveralNines. (2022). How to Measure Database Performance: Latency, Throughput, and QPS. https://severalnines.com/blog/how-measure-database-performance/ [12] MongoDB. (2025). PyMongo Driver - Official MongoDB Python Driver Documentation. https://www.mongodb.com/docs/languages/python/pymongo-driver/current/

Copyright

Copyright © 2025 Akshay Lokare , Indrajeet Kedari , Smita Patil. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET75892

Publish Date : 2025-11-28

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here