Hindi, one of the most widely spoken languages globally, poses distinctive challenges for automated character recognition due to its structurally complex Devanagari script, conjunct consonants (matras), and inherent diacritical marks. This paper proposes a novel hybrid deep learning framework that unifies Deep Convolutional Neural Networks (DCNN) with Transformer architectures for accurate Hindi character recognition. The model exploits the spatial feature extraction power of convolutional layers alongside the long-range dependency modeling capacity of Transformer self-attention. Evaluation on the IIIT-HW-Dev and Devanagari Character Dataset (DCD) demonstrates a state-of-the-art accuracy of 98.37%, substantially exceeding existing methods such as standalone CNNs (92.14%), ResNet-50 (94.63%), and Vision Transformer (95.81%). Detailed ablation studies validate the contribution of each architectural component, and robustness evaluations confirm the model\'s resilience across both handwritten and printed Hindi text.
Introduction
This study presents a hybrid Deep Convolutional Neural Network (DCNN) and Transformer-based model for Hindi character recognition (HCR) using the Devanagari script, which is widely spoken but challenging for OCR due to complex features such as matras, conjunct consonants (samyuktakshar), and shirorekha. Traditional OCR methods relying on handcrafted features have limited performance, while pure CNNs struggle with long-range spatial dependencies and pure Transformers require large datasets.
The proposed system combines the strengths of both approaches: a CNN backbone extracts local stroke and shape features, while a Transformer encoder captures global contextual relationships through self-attention. A comprehensive preprocessing pipeline—including binarization, noise removal, shirorekha removal, resizing, normalization, and extensive data augmentation—improves robustness and generalization.
The model was trained on the Devanagari Character Dataset (92,000 images, 46 classes) using PyTorch and evaluated against CNN, ResNet-50, VGG-16, and Vision Transformer baselines. The proposed hybrid architecture achieved 98.37% accuracy, outperforming all comparison models. Ablation studies confirmed the importance of the Transformer encoder, positional embeddings, and multi-head attention mechanisms. Robustness testing under noise, blur, and compression also showed superior performance compared to existing models.
The results demonstrate that combining CNN-based local feature extraction with Transformer-based global context modeling significantly improves Devanagari character recognition accuracy and robustness, making the approach suitable for applications such as document digitization, language translation, and accessibility tools. Future work includes model compression, word-level recognition, self-supervised learning, and deployment on mobile devices.
Conclusion
This paper presented a novel hybrid deep learning architecture integrating Deep Convolutional Neural Networks with Transformer self-attention for automated Hindi character recognition. The proposed model captures both local stroke-level features and global structural relationships within Devanagari characters, achieving state-of-the-art accuracy of 98.37% on the Devanagari Character Dataset, substantially outperforming all evaluated baseline methods. Ablation studies confirmed the significance of each architectural component, and robustness evaluations demonstrated the model\'s resilience under degraded real-world conditions. This work advances the state of the art in Hindi OCR and lays a foundation for high-performance recognition systems applicable to other complex Indic scripts.
References
[1] N. Sharma, U. Pal, F. Kimura, and S. Pal, \"Recognition of off-line handwritten Devnagari characters using quadratic classifier,\" in Proc. Indian Conf. Computer Vision, Graphics and Image Processing, 2006, pp. 805–816.
[2] U. Bhattacharya and B. B. Chaudhuri, \"Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals,\" IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 3, pp. 444–457, Mar. 2009.
[3] M. Kumar and R. K. Singh, \"Devanagari handwritten character recognition using support vector machine,\" Int. J. Comput. Sci. Inf. Technol., vol. 3, no. 6, pp. 5095–5099, 2012.
[4] S. Acharya, A. K. Pant, and P. K. Gyawali, \"Deep learning based large scale handwritten Devanagari character recognition,\" in Proc. 9th Int. Conf. Softw., Knowl., Inf., Ind. Manage. Appl., 2015, pp. 1–6.
[5] V. Patil and G. Ramesh, \"Deep residual learning for Devanagari handwritten character recognition,\" in Proc. IEEE Int. Conf. Comput. Intell. Comput. Res., 2017, pp. 1–5.
[6] H. M. Balaha, E. M. El-Gendy, and M. M. Saafan, \"CovH2SD: A COVID-19 detection approach based on Harris hawks optimization and stacked deep learning,\" Expert Syst. Appl., vol. 186, p. 115805, 2021.
[7] A.Dosovitskiy et al., \"An image is worth 16×16 words: Transformers for image recognition at scale,\" in Proc. Int. Conf. Learn. Representations, 2021.
[8] M. Li, T. Lv, J. Cui, L. Chen, Z. Zhang, F. Wei, and X. Zhou, \"TrOCR: Transformer-based optical character recognition with pre-trained models,\" in Proc. AAAI Conf. Artif. Intell., 2023, vol. 37, pp. 13094–13102.
[9] V. Krishnamurthy, M. Balasubramanian, and R. Sundaram, \"Swin Transformer for Tamil character recognition: A comparative study,\" J. King Saud Univ. Comput. Inf. Sci., vol. 35, no. 4, pp. 101–112, 2023.
[10] G. S. Lehal and C. Singh, \"A Gurmukhi script recognition system,\" in Proc. 15th Int. Conf. Pattern Recognit., 2000, vol. 2, pp. 557–560.