AI Researchers Got Chatbots to Share Cocaine Recipes Using This One Wild Trick

Summary

Researchers found that leading AI models can be manipulated into treating attacker-controlled text as their own trusted reasoning, exposing a structural weakness in how LLMs separate instructions from untrusted content. In “Prompt Injection as Role Confusion,” they describe Chain-of-Thought Forgery, which inserts fake reasoning that mimics internal thought patterns. Models that normally reject harmful requests then accepted the fabricated reasoning and produced cocaine synthesis instructions. The attack raised jailbreak success from near zero to about 60% across several tested models, including GPT-5 variants and other major systems. The same role-confusion flaw also let researchers trick an AI coding agent into uploading a secrets file by hiding malicious commands in a webpage. The core problem is that models often infer trust from style and formatting rather than from reliable role boundaries, so injected text can be mistaken for user input or even the model’s own reasoning.