With the constant introduction of AI technologies into our lives, attack surfaces are emerging on the basis of vulnerabilities in AI processing systems. Text input manipulation, through the use of Unicode-based data obfuscation (e.g. the use of emojis, zero-width characters, and homoglyph replacements) is one of them. such techniques enable attackers to bypass filters, evade content moderation, and even insert adversarial prompts into large language models (LLMs). This survey studies the situation regarding emoji and Unicode-based data masking attacks, categorizes the techniques used, determines legitimate uses, and discusses the difficulties with defending against these attacks. Future work is also suggested in the research as well as summarizing possible defence schemes. This paper has tried to assemble recent research and field-based case studies in an effort to form a concrete basis of research.
Introduction
Artificial Intelligence (AI) systems, especially those using Natural Language Processing (NLP) like large language models (LLMs), are increasingly targeted by sophisticated attacks that exploit Unicode and emoji-based text masking techniques. Attackers use emojis, zero-width characters, homoglyph substitutions, and combining marks to obfuscate malicious inputs, enabling them to bypass AI-powered content moderation, spam filters, and input validation. This includes adversarial prompt injections that manipulate AI behavior, posing risks of misinformation, security breaches, and policy violations.
Despite the growing threat, systematic research on Unicode-based attacks is limited. The survey categorizes these attacks—such as emoji insertion, zero-width character exploitation, homoglyph substitution, combining marks, and mixed script attacks—illustrating how they confuse tokenization processes and evade detection.
Real-world cases, like Google Cloud’s “Emoji Jailbreaks” and social engineering campaigns using Unicode masking, highlight the practical impact. These attacks exploit vulnerabilities in tokenization algorithms, variability in Unicode normalization, and evade traditional security controls.
Mitigating these attacks requires a layered defense approach including:
Unicode normalization during preprocessing,
Detection and removal of invisible characters,
Homoglyph detection algorithms,
Contextual input sanitization to prevent prompt injections,
AI-driven anomaly detection for unusual Unicode patterns,
And user awareness combined with cybersecurity policies.
The paper emphasizes the urgent need for further research and improved defense mechanisms to enhance AI robustness against these emerging Unicode-based adversarial threats.
References
[1] Google Cloud, “Emoji Jailbreaks: Breaking AI Models with Unicode,” Medium, 2024. [Online]. Available: https://medium.com/google-cloud/emoji-jailbreaks-b3b5b295f38b
[2] “How emojis are becoming AI\'s weakest link in cybersecurity,” The Economic Times, 2024. [Online]. Available:
https://economictimes.indiatimes.com/magazines/panache/how-emojis-are-becoming-ais-weakest-link-in-cybersecurity/articleshow/120253502.cms
[3] Repello.ai, “Prompt Injection Using Emojis,” 2024. [Online]. Available: https://repello.ai/blog/prompt-injection-using-emojis
[4] X et al., “Prompt Injection Attacks Using Unicode Manipulation,” arXiv, 2024. [Online]. Available: https://arxiv.org/pdf/2411.01077
[5] Unicode Consortium, “Unicode Technical Standard #39: Unicode Security Mechanisms,” 2023. [Online]. Available: https://www.unicode.org/reports/tr39/
[6] N. Carlini, A. Mishra, et al., “Poisoning Web-Scale Training Datasets Is Practical,” arXiv, 2023. [Online]. Available: https://arxiv.org/abs/2306.04634
[7] J. Jia and P. Liang, “Adversarial Examples for Evaluating Reading Comprehension Systems,” in Proc. of EMNLP, 2017, pp. 2021-2031.
[8] M. Zhang et al., “Adversarial Attack and Defence on AI-Based Content Moderation Systems,” IEEE Access, vol. 10, pp. 70312–70324, 2022.