Data is everywhere now. Most teams struggle to analyse it quickly without a data engineer on speed dial. That was the starting frustration that led us to build Vizzy. We introduce Vizzy—a governed analytics platform that lets anyone ask questions in plain English and get charts back in seconds. Under the hood it runs a self-healing NL2SQL pipeline on Groq’s inference API, a DuckDB columnar engine, and an approval-gated cleaning workflow that keeps raw data untouched. Surprisingly, we achieved sub-105 ms p95 query latency on one-million-row tables on a commodity Intel i5. Our NL2SQL hit rate reached 96.3 % after a single LLM retry pass—numbers we honestly did not expect from a student project. The stack is React 19 + FastAPI + SQLModel + DuckDB, with role-based access control (RBAC) and append-only audit logs baked in from day one. Practically speaking, this paper shows that undergrad teams can ship enterprise-grade governed analytics without cloud warehouses or big budgets.
Introduction
Data is growing rapidly, but most small organizations still struggle to extract insights due to the complexity of SQL tools, high BI costs, and reliance on manual Excel-based analysis. Although large language models (LLMs) have improved natural language to SQL (NL2SQL) conversion, real-world deployment remains unreliable due to issues like hallucinated column names, schema changes, and lack of governance. To address this, the paper introduces Vizzy, a secure and governed analytics system designed and implemented by students.
Vizzy contributes four key innovations: a self-healing NL2SQL pipeline with validation and fallback mechanisms, a human-in-the-loop data cleaning framework with approval-based execution, a high-performance analytics engine built on DuckDB achieving sub-105 ms query latency, and an automatic dashboard generator that creates domain-aware visualizations without configuration.
The system builds on prior work in NL2SQL (ATIS, Spider, PICARD), data cleaning (Deequ, HoloClean), and in-process analytics (DuckDB), but extends them with production-oriented features such as schema versioning, audit logging, and sandboxed execution to handle real-world data challenges.
Architecturally, Vizzy uses a React frontend and FastAPI backend with secure JWT authentication, role-based access control, and full audit trails. Its ingestion layer supports multiple data formats and external database connections while maintaining immutable dataset versions. A key feature is the inspection and cleaning pipeline, where LLM-generated cleaning plans must be approved by users before execution.
The NL2SQL pipeline includes intent classification, schema injection, LLM-based SQL generation, SQL validation using SQLGlot, and execution in DuckDB with fallback diagnostics. The dashboard engine maps datasets to business domains (e.g., sales or HR) and automatically generates KPIs and visualizations using predefined templates and governance rules.
Conclusion
We set out to build something a domain expert could actually use—no SQL, no BI license, no data engineer required. Vizzy does that. It handles the full governed analytics lifecycle: ingest, inspect, clean, query, visualise, and audit—with LLM-powered natural language at the centre.
The numbers backed us up. Sub-105 ms p95 on a million rows. 96.3 % NL2SQL accuracy. 94 % cleaning recall. For a final-year B.Tech project built on commodity hardware in Tiruchirappalli, we think that is a genuinely solid result.
What surprised us most was how much the governance layer mattered. We expected NL2SQL accuracy to be the hard part. It turned out that AnalysisContract enforcement, the cleaning approval workflow, and schema-hash versioning were what made users actually trust the output. Accuracy alone is not enough—traceability is what closes the deal.
If this work is useful to other undergrad teams building LLM-powered data tools, that is the best outcome we could hope for. The architecture patterns documented here—especially the hybrid DuckDB/Pandas fallback, the approval-gated cleaning pipeline, and per-version DuckDB sandboxing—are reusable without cloud infrastructure or paid APIs beyond the Groq free tier.
References
[1] R. Florian et al., “Evaluating LLMs for NL2SQL: A Bench- mark on Realistic Queries,” Proc. ACL, 2023.
[2] T. Yu et al., “Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task,” EMNLP, 2018.
[3] C. T. Hemphill, J. J. Godfrey, and G. R. Doddington, “The ATIS Spoken Language Systems Pilot Corpus,” HLT, 1990.
[4] B. Wang and R. Shin, “RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers,” ACL, 2020.
[5] T. Scholak, N. Schucher, and D. Bahdanau, “PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,” EMNLP, 2021.
[6] S. Schelter et al., “Automating Large-Scale Data Quality Verification,” VLDB, 2018.
[7] S. Rekatsinas et al., “HoloClean: Holistic Data Repairs with Probabilistic Inference,” VLDB, 2017.
[8] M. Raasveldt and H. Muehleisen, “DuckDB: an Embed- dable Analytical Database,” SIGMOD, 2019.
[9] T. Wolf et al., “HuggingFace’s Transformers: State-of- the-Art Natural Language Processing,” EMNLP Findings, 2020