The rapid advancement of large language models (LLMs) has made AI-generated text increasingly fluent and indistinguishable from human writing.
However, malicious use of AI text for misinformation or plagiarism raises the need for reliable detectors. Simple detectors often fail when the AI-generated text is paraphrased by an adversary. In this work, we propose a robust detection framework based on deep recurrent neural networks (RNNs) that is resilient to paraphrasing. We compile a large dataset of AI-generated and human-written text (e.g., a 500K Kaggle corpus and simulate paraphrase attacks using state-of-the-art paraphrasing models. Our model employs a multi-layer Long Short-Term Memory (LSTM) network to capture sequential patterns and is trained with both original and paraphrased samples. In experiments, the proposed RNN classifier achieves high accuracy on unaltered AI text and retains strong performance on paraphrased adversarial examples (far exceeding the drop seen in baseline detectors. These results demonstrate that deep recurrent models, when properly trained, can detect AI-generated content even under paraphrasing attacks. We discuss implications for academic integrity and outline future enhancements such as multi-lingual extensions.
Introduction
The rise of AI-generated text (e.g., from GPT-4) has led to applications and risks like misinformation and academic dishonesty, making AI text detection crucial. However, standard detectors are vulnerable to paraphrasing attacks, which drastically reduce accuracy.
This work proposes a robust LSTM-based RNN detector trained on a mixture of original and paraphrased AI texts alongside human-written texts. By including paraphrase-augmented data, the model learns features that remain invariant under surface edits, allowing it to identify underlying AI-generated structures. Experiments on large datasets (e.g., Kaggle 500K essays) show the model achieves ∼96–97% accuracy on original texts and maintains over 92% on paraphrased AI texts, significantly outperforming naive detectors.
The study demonstrates that adversarially trained RNNs can detect AI-generated text effectively even under paraphrasing attacks, emphasizing the importance of data augmentation and sequential linguistic modeling for robust AI-text detection.
Conclusion
We have demonstrated that robust detection of paraphrased AI-generated text is feasible using deep recurrent neural networks. By training an LSTM-based classifier on both original and paraphrased examples, our model captures linguistic patterns that remain stable under adversarial editing. Empirical results show high detection accuracy on standard benchmarks and greatly improved resilience to paraphrasing compared to baseline models.
In future work, we plan several extensions. First, we will incorporate multilingual training data to assess generalization beyond English. Second, we aim to explore hybrid models that combine recurrent layers with self-attention (e.g. Transformer-LSTM hybrids) to capture even richer features. Third, we will investigate dynamic paraphrasing defenses, such as online adaptation where the detector continually fine-tunes on new paraphrased examples (analogous to adversarial training in image recognition). Finally, integrating metadata (document provenance) or watermarking signals (where available) could further enhance reliability. Overall, as AI-generated content continues to evolve, maintaining robust detection will require continual adaptation of models and datasets, but our work provides evidence that RNN-based approaches remain a powerful tool in this space.
References
[1] K. Krishna, Y. Song, M. Karpinska, J. Wieting, and M. Iyyer, “Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense,” in Advances in Neural Information Processing Systems (NeurIPS) 2023, 2023.
[2] V. S. Sadasivan, A. Kumar, S. Balasubramanian, W. Wang, and S. Feizi, “Can AI-Generated Text be Reliably Detected?” Trans. Machine Learning Research, 2023. [Online]. Available: https://arxiv.org/abs/2303.11156
[3] J. Huang, R. Zhang, J. Su, and Y. Chen, TempParaphraser: Heating Up” Text to Evade AI-Text Detection through Paraphrasing,” in Proc. of EMNLP, 2024, pp. 1–20.
[4] P.-Y. Chen, “Computational Safety for Generative AI: A Signal Processing Perspective,” arXiv preprint arXiv:2502.12445, 2025.
[5] A. Kayaba¸s, A. E. Top¸cu, Y. I. Alzoubi, and M. Y?ld?z, “A Deep Learning Approach to Classify AI-Generated and Human-Written Texts,” Applied Sciences, vol. 15, no. 10, 2025.
[6] H. U. Khan, A. Naz, F. K. Alarfaj, N. Almusallam, and O. Semiz, “Identifying AI-generated content using DistilBERT and NLP techniques,” Scientific Reports, vol. 15, Article 20366, 2025.
[7] S. K. Aityan, W. Claster, K. S. Emani, S. Rais, and T. Tran, “A Lightweight Approach to Detection of AI-Generated Texts Using Stylometric Features,” arXiv preprint arXiv:2511.21744, 2025.