Breaking CAPTCHA Using Transformer-Based OCR Models: A Deep Learning Approach

Authors: Abhinav Chaturvedi

DOI Link: https://doi.org/10.22214/ijraset.2025.75241

Abstract

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) systems are widely employed to secure web applications against automated bots. However, recent advancement in deep learning and pattern recognition, particularly transformer architectures, have posed significant challenges to the robustness of CAPTCHA security measures. The objective of this research is to analyze security flaws even in highly distorted and complex character CAPTCHAs. In this research, we propose a novel approach using Optical Character Recognition (OCR) models based on Convolutional Recurrent Neural Networks (CRNN) and hybrid CNN-Transformer architectures to evaluate their efficacy in breaking CAPTCHA images. In this paper, two distinct OCR models have been trained: a baseline CRNN model and an advanced hybrid CRNN-Transformer model. Extensive experiments demonstrated that the hybrid CNN-Transformer significantly outperformed the traditional CRNN model, achieving an accuracy of 95.1%, compared to the baseline CRNN model\'s 79.3%. Findings from the research highlight the susceptibility of current CAPTCHA systems to transformer-based OCR techniques and suggest that integrating transformer components into OCR models markedly enhances their capability in recognizing distorted text, posing critical implications for future CAPTCHA design and cybersecurity.

Introduction

CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) is a security mechanism designed to prevent automated bots from accessing online services. Traditional text-based CAPTCHAs rely on visual distortions, noisy backgrounds, and character deformations to differentiate humans from machines. However, advancements in deep learning, especially CNNs and LSTMs, have greatly improved automated CAPTCHA-solving, making traditional methods increasingly vulnerable.

The introduction of Transformer architectures further enhanced CAPTCHA recognition by capturing long-range dependencies in sequential data. Hybrid models combining CNN feature extraction with Transformer encoders have shown superior performance in OCR tasks, especially for heavily distorted CAPTCHA images. This study compares a conventional Convolutional Recurrent Neural Network (CRNN) with a hybrid CNN-Transformer model, demonstrating that the Transformer-enhanced model significantly outperforms CRNN in accuracy and robustness. The work highlights the pressing need for more secure CAPTCHA designs to counteract increasingly capable automated solvers.

Related Work:

Early CAPTCHA recognition relied on traditional OCR methods, character segmentation, pixel analysis, and statistical modeling, which were limited in handling distortions and noise.
CNNs enabled effective recognition of complex text and distorted characters.
RNNs/LSTMs improved sequential modeling, enhancing CRNN-based CAPTCHA solvers.
Transformers, with self-attention mechanisms, surpass RNNs in capturing long-range dependencies, and hybrid CNN-Transformer models have emerged as state-of-the-art in CAPTCHA recognition.
Surveys and adversarial research underscore the ongoing security vulnerabilities in traditional CAPTCHA systems.

Methodology:

Dataset: Real and synthetic CAPTCHAs (4–6 characters, digits 0–9 and uppercase A–Z), stored as images with corresponding CSV labels. Dataset split: 100,000 training, 20,000 validation, 10,000 test samples.
Preprocessing & Augmentation: Grayscale conversion, resizing to 50×200 pixels, normalization to [-1,1], and label indexing. Augmentation techniques include random rotation, Gaussian noise, color jittering, elastic distortion, and random occlusion to improve generalization and robustness.
Model Training: Experiments conducted on NVIDIA RTX 3070 GPU, comparing conventional CRNN with hybrid CNN-Transformer architectures for performance evaluation.

Conclusion

In this paper, we presented Transformer-based OCR models for CAPTCHA recognition and demonstrated that hybrid CNN-Transformer models outperform traditional methods. While these models achieve lower error rates, their success also emphasizes the growing vulnerability of CAPTCHA systems. Future work will involve: 1) Conducting extensive ablation studies to further optimize model architecture. 2) Integrating adversarial training methods to improve model robustness. 3) Exploring multi-modal CAPTCHA systems that incorporate additional security layers. 4) Performing large-scale experiments on diverse CAPTCHA datasets. Our findings underscore the need for more sophisticated CAPTCHA designs to keep pace with advances in deep learning.

References

[1] Von Ahn, L., Blum, M., Hopper, N.J. and Langford, J., 2003, May. CAPTCHA: Using hard AI problems for security. In International conference on the theory and applications of cryptographic techniques (pp. 294-311). Berlin, Heidelberg: Springer Berlin Heidelberg. [2] Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S. and Shet, V., 2013. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv preprint arXiv:1312.6082. [3] Jaderberg, M., Simonyan, K., Vedaldi, A. and Zisserman, A., 2016. Reading text in the wild with convolutional neural networks. International journal of computer vision, 116, pp.1-20. [4] Hochreiter, S. and Schmidhuber, J., 1997. Long short-term memory. Neural computation, 9(8), pp.1735-1780. [5] Ranzato, M.A., Chopra, S., Auli, M. and Zaremba, W., 2015. Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732. [6] Lipton, Z.C., Berkowitz, J. and Elkan, C., 2015. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019. [7] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, ?. and Polosukhin, I., 2017. Attention is all you need. Advances in neural information processing systems, 30. [8] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J., 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. [9] Kong, S. and Chang, Y.C., 2023, May. A CAPTCHA Recognition Algorithm Based on Character Segmentation and Random Forest. In Journal of Physics: Conference Series (Vol. 2504, No. 1, p. 012036). IOP Publishing. [10] Noury, Z. and Rezaei, M., 2020. Deep-CAPTCHA: a deep learning based CAPTCHA solver for vulnerability assessment. arXiv preprint arXiv:2006.08296. [11] Kumar, M., Jindal, M.K. and Kumar, M., 2022. A systematic survey on CAPTCHA recognition: types, creation and breaking techniques. Archives of Computational Methods in Engineering, 29(2), pp.1107-1136. [12] LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W. and Jackel, L.D., 1989. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), pp.541-551. [13] Challagundla, B.C., Gogireddy, Y.R. and Peddavenkatagari, C.R., 2024. Efficient CAPTCHA image recognition using convolutional neural networks and long short-term memory networks. International Journal of Scientific Research in Engineering and Management (IJSREM), 8(3), pp.1-5. [14] Papa, L., Russo, P., Amerini, I. and Zhou, L., 2024. A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking. IEEE Transactions on Pattern Analysis and Machine Intelligence. [15] Wan, X., Johari, J. and Ruslan, F.A., 2024. Adaptive-CAPTCHA: A Text CAPTCHA Solver based on CRNN and Configurable Filter Networks. [16] Li, X., Ding, H., Yuan, H., Zhang, W., Pang, J., Cheng, G., Chen, K., Liu, Z. and Loy, C.C., 2024. Transformer-based visual segmentation: A survey. IEEE transactions on pattern analysis and machine intelligence. [17] Derea, Z., Zou, B., Al-Shargabi, A.A., Thobhani, A. and Abdussalam, A., 2023. Deep Learning Based CAPTCHA Recognition Network with Grouping Strategy. Sensors, 23(23), p.9487. [18] Kumar, M., Jindal, M.K. and Kumar, M., 2022. A systematic survey on CAPTCHA recognition: types, creation and breaking techniques. Archives of Computational Methods in Engineering, 29(2), pp.1107-1136. [19] Igbekele, E.O., Adebiyi, A.A., Ibikunle, F.A., Adebiyi, M.O. and Oludayo, O.O., 2021. Research trends on CAPTCHA: A systematic literature. International Journal of Electrical and Computer Engineering, 11(5), p.4300. [20] Kanoosh, H.M., Abbas, A.F., Kamal, N.N., Khadim, Z.M., Majeed, D.A. and Algburi, S., 2024, May. Image-Based CAPTCHA Recognition Using Deep Learning Models. In Proceedings of the Cognitive Models and Artificial Intelligence Conference (pp. 273-278). [21] Puneet and Deepika, 2024, March. Redefining Security: Unveiling the Vulnerabilities of Captcha Mechanisms Using Deep Learning. In 2024 International Conference on Emerging Smart Computing and Informatics (ESCI) (pp. 1-6). IEEE. [22] Shi, C., Xu, X., Ji, S., Bu, K., Chen, J., Beyah, R. and Wang, T., 2021. Adversarial captchas. IEEE transactions on cybernetics, 52(7), pp.6095-6108. [23] Li, C., Chen, X., Wang, H., Wang, P., Zhang, Y. and Wang, W., 2021. End-to-end attack on text-based CAPTCHAs based on cycle-consistent generative adversarial network. Neurocomputing, 433, pp.223-236.

Copyright

Copyright © 2025 Abhinav Chaturvedi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET75241

Publish Date : 2025-11-09

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here