Handwritten Marathi documents are still widely used in 0but converting them into digital text is difficult because of the complexity of the Devanagari script and variations in handwriting. This project introduces a full transformer-based system that automatically reads handwritten Marathi text and converts it into accurate, well-structured digital output.
The system replaces older OCR methods such as CNNs and RNNs with advanced transformer models. A Vision Transformer (ViT) is used to analyze handwritten strokes, understand character shapes, and handle complex features like matras and conjunct letters. After the text is extracted, a Marathi language transformer model corrects grammar, spelling, and sentence flow, ensuring that the final output is meaningful and easy to read. A user-friendly Flask web interface allows users to upload handwritten images. Once an image is submitted, the system performs preprocessing steps such as noise reduction, contrast improvement, text region detection, and segmentation. These steps help improve recognition accuracy, especially for unclear or low- quality images. The final processed text is then refined by the grammar-correction module and returned to the user in clean digital form. By combining vision transformers with Marathi language transformers, the system achieves higher accuracy across different handwriting styles and varying document quality. The proposed solution is suitable for real-world applications such as academic paper checking, historical document digitization, record-keeping in government offices, and any organization that needs fast and accurate digitization of handwritten Marathi content.
Introduction
The text presents a transformer-based system for recognizing handwritten Marathi (Devanagari) text and converting it into clean, grammatically correct digital output. It addresses the limitations of traditional OCR methods like CNNs, RNNs, and template matching, which struggle with complex Marathi handwriting, conjunct characters, and limited training data.
The proposed solution uses Vision Transformers (ViT) to analyze entire images of handwritten text and extract characters more accurately by capturing global stroke relationships. A dedicated Marathi language model then performs grammar correction, handling linguistic features such as verb forms, sandhi rules, and gender–number agreement. The system is implemented as a Flask-based web application where users upload images and receive processed text.
The methodology includes preprocessing (noise removal and image enhancement), ViT-based text recognition, text normalization, and transformer-based grammar correction. Compared to earlier approaches, this system is more robust, requires fewer data constraints, and produces more accurate and meaningful Marathi text.
Conclusion
This work introduces a modern, efficient, and scalable OCR system designed specifically for digitizing handwritten Marathi text. By using a transformer-based recognition pipeline along with strong preprocessing and linguistic post-correction modules, the proposed framework overcomes many of the challenges found in traditional CNN- and RNN-based OCR approaches. These older models often struggle with the complex nature of the Devanagari script— especially with conjunct letters, varying stroke shapes, and wide differences in personal handwriting styles. In contrast, the transformer-based system presented in this project handles these complexities more effectively and delivers significantly higher accuracy.
The experimental results clearly show that preprocessing plays an essential role in improving recognition quality, raising character- level accuracy to approximately 87–92% across different types of handwritten inputs. Comparative studies with widely used OCR tools such as Tesseract and Google Vision further highlight the advantages of the proposed method, as these systems show noticeable performance drops when dealing with handwritten Marathi text. The superior performance of the transformer-based model demonstrates its suitability for real-world applications where handwriting quality and style may vary greatly.
By combining attention-driven text recognition with context-aware grammar correction, the system generates output that is both structurally correct and linguistically meaningful. This makes the resulting text highly suitable for various downstream tasks including digital archiving, document search, sentiment analysis, information extraction, and classification.
References
[1] A. Vaswani et al., “Attention Is All You Need,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), pp. 5998–6008, 2017.
[2] A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in Proc. International Conference on Learning Representations (ICLR), 2021.
[3] M. Li et al., “TroCR: Transformer-Based Optical Character Recognition with Pre-trained Models,” arXiv preprint arXiv:2109.10282, 2021.
[4] S. Sharma and U. Pal, “Handwritten Devanagari Character Recognition using Zernike Moments and SVM Classifier,” in Proc. IEEE International Conference on Advances in Computing and Applications, 2018.
[5] B. B. Chaudhari and U. Mahajan, “Offline Handwritten Devanagari Script Recognition: A Survey,” International Journal of Computer Applications, vol. 179, no. 29, pp. 20–26, 2018.
[6] J. Prasad and S. Ramakrishnan, “Improving Handwritten Devanagari Character Recognition Using Deep Learning Techniques,” in Proc. IEEE 15th ICCNT, 2020.
[7] U. Pal and B. B. Chaudhuri, “Indian Script Character Recognition: A Survey,” Pattern Recognition, vol. 37, no. 9,
pp. 1887–1899, 2004.
[8] A. R. Mittal, N. G. Bharadwaj, and S. Patel, “A Review on Optical Character Recognition for Devanagari Script,” in Proc. IEEE International Conference on Advances in Technology and Engineering, 2013.
[9] P. Krishnan et al., “LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking,” arXiv preprint arXiv:2204.08387, 2022.
[10] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” in Proc. EMNLP, 2014.
[11] T. Mikolov et al., “Recurrent Neural Network Based Language Model,” in Proc. Interspeech, pp. 1045–1048, 2010.
[12] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding,” in Proc. NAACL-HLT, 2019.
[13] R. Joshi and S. Kulkarni, “Marathi Language Processing: A Survey of Techniques, Challenges and Tools,” International Journal of Advanced Research in Computer Science, vol. 11, no. 3, pp. 1–8, 2020.
[14] S. B. Patil and R. S. Hegadi, “Preprocessing Techniques for Recognition of Handwritten Devanagari Script,” International Journal of Computer Science Issues, vol. 9, no. 3, pp. 329–334, 2012.
[1] Google AI Team, “Multilingual OCR and Document Understanding Techniques,” Google Research Blog, 2021.
[2] A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” Advances in Neural Information Processing Systems, 2020.
[3] S. Narasimhan, SM. Saravanan, and R. Prakash, “Handwritten Indic Script Recognition Using Deep Neural Networks,” IEEE Access, vol. 8, pp. 10496–10508, 2020.
[4] A. Kumar and S. Singh, “A Study of Handwritten Character Recognition Using Machine Learning Techniques for Devanagari Script,” International Journal of Engineering Research & Technology, vol. 9, no. 5, 2020.
[5] P. Mishra and N. Patel, “Improved OCR for Devanagari Script Using Hybrid Preprocessing Techniques,” Procedia Computer Science, vol. 167, pp. 1898–1907, 2020.
[6] H. Li, Z. Yin, and L. Wang, “Document Image Cleaning with Generative Adversarial Networks,” IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 220–230, 2019.
[7] S. B. Prasad and S. Ramakrishnan, “Evaluation of Deep Learning Architectures for Handwritten Indic Script OCR,” IEEE International Conference on Computer Vision and Image Processing, 2021.
[8] A. Gupta, S. Ghosh, and A. Bhattacharya, “Benchmarking Transformer Models for Multilingual OCR,” arXiv preprint arXiv:2203.12345, 2022.
[9] E. H. Hossain, A. Rahman, and S. Ferdous, “Seq2Seq Transformers for Grammar Correction in Low-Resource Languages,” IEEE Transactions on Emerging Topics in Computing, 2022.
[10] D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” arXiv preprint arXiv:1409.0473, 2014.
[11] S. Roy and U. Pal, “Document Image Binarization for Indian Scripts Using Adaptive Thresholding,” International Conference on Computer and Information Technology, 2018.
[12] R. S. Hegde and M. K. Prasad, “Challenges in Handwritten Indian Language OCR Systems: A Survey,” Pattern Recognition Letters, vol. 142, pp. 44–52, 2021.
[13] X. Li, J. Cai, and X. He, “DocTr: Document Transformer for Structured OCR Tasks,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
[14] N. Balakrishnan, R. Raman, and S. Sundaram, “Optical Character Recognition for Low-Resolution Document Images,” IEEE Signal Processing Letters, 2020.
[15] C. Kim, H. Shim, and S. Lee, “Improving OCR for Low- Resource Scripts Using Synthetic Data Generation,” International Journal of Document Analysis and Recognition, vol. 25, no. 3, pp. 275–287, 2022.
[16] P. Jain and A. Chauhan, “Context-Aware Post-Correction for OCR Errors in Devanagari Script Using Transformer-Based Language Models,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 6, pp. 59–67, 2022.