Inaudible Audio Attacks Can Hijack AI Voice Models, Study Finds

Summary

Researchers at Zhejiang University developed a method to attack AI voice models by embedding inaudible, hidden commands in audio clips, with up to a 96% success rate. The technique, called AudioHijack, manipulates the audio signal’s digital waveform in a way humans cannot hear, causing targeted voice models to interpret and execute unauthorized commands. Unlike traditional prompt injection, this method does not alter spoken user prompts, making it harder to detect and defend against. Testing on 13 open-source and commercial AI voice models showed that AudioHijack can override intended actions, refuse legitimate requests, spread false information, change the model’s personality, or trigger undesired operations like web searches or data exfiltration. The manipulated commands can be delivered through common media such as online videos, music, or voice messages, and attacks are possible even in live AI conversations. Defenses based on monitoring internal model mechanisms had limited success, as attackers could adapt and maintain the attack’s effectiveness, highlighting the challenge of distinguishing between genuine and malicious audio input.