Exploring Emoji-Based Prompt Manipulation in LLMs

Researchers tested 50 emoji-augmented prompts across four open-source LLMs (Mistral 7B, Qwen 2 7B, Gemma 2 9B, Llama 3 8B) and report model-dependent vulnerabilities: some models yielded restricted outputs for a fraction of prompts while others resisted the attacks entirely. The paper shows emoji sequences can alter token/representation boundaries and sometimes bypass prompt-level safety checks. Why this matters for practitioners: • Emoji sequences are ubiquitous and often treated as harmless; adversaries can exploit that trust to craft covert jailbreaks. • Vulnerability is model-specific. Defensive choices (safety judges, filtering logic, tokenization strategy) materially affect resilience. • Attacks that blend non-textual tokens with natural language can evade keyword filters and some judge-based systems, and may require multimodal or representation-aware defenses. Practical short checklist: • Treat emojis and other non-alphanumeric tokens as potential attack surface in red-team exer

Exploring Emoji-Based Prompt Manipulation in LLMs

Related Articles

Palmer Luckey’s retro gaming startup ModRetro reportedly seeks funding at $1B valuation

Cakelisp

Why octal notation should be used for UTF-8 (and Unicode) (2016)

From WAP to Agent-First: Why the UI Is Becoming Optional

Solving Regex Crosswords Without Z3

Related Articles

News
Palmer Luckey’s retro gaming startup ModRetro reportedly seeks funding at $1B valuation
TechCrunch • 19h ago

News
Cakelisp
Lobsters • 20h ago

News
Why octal notation should be used for UTF-8 (and Unicode) (2016)
Lobsters • 20h ago

News
From WAP to Agent-First: Why the UI Is Becoming Optional
Medium Programming • 20h ago

News
Solving Regex Crosswords Without Z3
Lobsters • 20h ago