Thisproject focusesonbuilding a smart,easy- to-use Grammatical Error Correction (GEC) system for anativeIndiclanguage,specificallyMarathi.We\'reusing powerful, modern AI models called transformer-based models (like IndicBERT and mBART), which we\'ll fine- tunewithlocallanguagedata.Themainjobofthesystem is to spot and fix grammar mistakes in sentences in real- time.Thefinishedproductwillbeasimplewebtoolwhere users can type their text, see suggested fixes with quick explanations, and choose which changes to accept or reject. This work is essential because it fills a big gap, making writing more accurate and accessible in local languagesthatcurrentlydon\'thavegoodgrammartools
Introduction
The project focuses on developing an advanced Grammatical Error Correction (GEC) system for Marathi, addressing the lack of intelligent language-support tools for regional Indian languages. Existing grammar checkers are optimized mainly for English and fail to handle the complex morphology, syntax, and context-dependent grammar of languages like Marathi. This gap in digital language support limits accurate written communication and slows down digital adoption among native speakers.
Motivation & Problem
Marathi suffers from:
No modern grammar correction tools
Low-resource datasets
Complex linguistic structures (gender, tense, morphology)
Existing rule-based systems and manual proofreading are slow, inconsistent, and ineffective for deeper grammatical issues. There is an urgent need for an accessible, AI-driven Marathi grammar correction solution.
Key Contributions
The proposed work offers:
A dedicated Marathi GEC system using transformer models (IndicBERT/mBART/mT5).
Synthetic data generation to overcome the scarcity of annotated corpora.
A Hybrid Model that integrates:
Rule-based correction (for deterministic and simple errors)
Transformer Seq2Seq MT model (for contextual and complex errors)
A real-time, user-friendly web interface to make grammar correction accessible to the Marathi-speaking community.
Related Work
Studies on Chinese, Portuguese, Turkish, and Zarma languages show that:
Transformer and MT-based GEC systems outperform rule-based approaches.
Hybrid approaches and synthetic datasets help low-resource languages.
GPT-based systems require caution due to over-correction.
Human evaluation is essential because BLEU/GLEU metrics are unreliable for non-English languages.
These findings validate the proposed hybrid Transformer-based approach for Marathi.
Research Gap
Major gaps identified:
Lack of comprehensive Marathi GEC tools.
Limited annotated data.
No hybrid GEC approaches for Marathi.
Poor evaluation standards and limited transition of research into usable applications.
Human evaluation on grammaticality, fluency, and meaning preservation
This allows comparison of rule-based, MT, and hybrid models.
Practical Benefits
The system will:
Improve writing quality for students, content creators, and professionals.
Provide real-time corrections with explanations.
Encourage digital adoption in Marathi-speaking communities.
Limitations & Challenges
Challenges include:
Risk of over-correction by LLMs
Marathi’s linguistic complexity
Lack of annotated datasets
Integration issues between backend and UI
High cloud GPU costs
User acceptance and trust issues
Future Work
Short-Term:
Integrate MahaGPT for next-word prediction
Build standardized Marathi GEC benchmark datasets
Long-Term:
Add Explainable AI (XAI) to provide reasons for corrections
Expand the system to other Indic languages using multilingual transformer models
Conclusion
TheAI-Based Grammatical Error Correction (GEC) systemfor Marathi represents a significant step forward in addressing the technological gap for low-resource native languages. The project successfully established the viability ofahybridarchitecturethatstrategicallycombinesahigh- precision rule-based checker with the contextual processing power of aTransformer Seq2Seq MT model(e.g., mT5/MarianMT). This approach effectively handles both simple and complex, context-dependent errors. By framing GECasatranslation task and utilizing techniqueslikenoise injectionfor syntheticdatageneration,thesystem overcomes the challenge of data scarcity specific to Marathi. The deployment as areal-time web toolsolves the practical usability gap, providing native speakers with an accessible means to improve writing accuracy and confidence. Ultimately, this project delivers a functional, tested, and scalablesolutionthatcanserveasatemplateforGECefforts in other underserved Indic languages
References
[1] I.Keita,A.W.Maiga,A.Sounaye,etal.(2025),\"GECforLow-ResourceLanguages: Case of Zarma.\"
[2] Y. Jin, B. Zhang, and Y. He (2023), \"Research and Analysis of GECTechnology for Chinese Documents.“
[3] M. Juri?i? and F. Sari? (2024), \"Evaluation of AI-based GrammarCorrection for Portuguese.“
[4] C. Bryant, M. Felice, and T. Briscoe (2023), \"Grammatical ErrorCorrection: A Survey of the State of the Art,\" Computational Linguistics,MITPress.
[5] S.Kobayashi,S.Flachs,andM.Rei(2024),\"RevisitingMeta-evaluationfor Grammatical Error Correction,\" Transactions of the Association forComputational Linguistics (TACL).
[6] A. Ersoy and E. Y?ld?z (2024), \"Organic Data-Driven Approach forTurkish GEC and LLMs,\" Workshop Proceedings / arXiv.
[7] J. Latouche, et al. (2024), \"Zero-shot Cross-Lingual Transfer forSynthetic Data in Grammatical Error Correction,\" EMNLP / arXiv.
[8] [ACL/LRECPapers](2024), \"GEC forCode-switched andMultilingualContexts,\" Proceedings of ACL & LREC.
[9] S. Chollampatt, D. T. Hoang, and H. T. Ng (2016), \"AdaptingGrammaticalErrorCorrectionBasedontheNativeLanguageofWriterswithNeural Network Joint Models,\" in Proc. EMNLP, pp. 1901–1911.
[10] J. Park (2019), \"An AI-based English Grammar Checker vs. HumanRaters in Evaluating EFL Learners’ Writing,\" Multimedia-AssistedLanguage Learning, vol. 22, no. 1, pp. 112–131, doi:10.15702/mall.2019.22.1.112.