Can AI Agents Boost Ethereum Security? OpenAI and Paradigm Created a Testing Ground
OpenAI and Paradigm have launched EVMbench, a tool to assess AI agents’ ability to detect, patch, and exploit high-severity vulnerabilities in Ethereum Virtual Machine (EVM) smart contracts, which form the core of Ethereum’s decentralized applications. EVMbench uses 120 curated vulnerabilities from 40 audits, including from audit competitions and Stripe’s layer-1 blockchain, Tempo, to provide testing scenarios grounded in real-world, economically significant code. The benchmark evaluates AI across three areas: detecting vulnerabilities, patching them without disrupting intended contract functionality, and exploiting them in a secure, sandboxed environment through simulated attacks. In testing, OpenAI’s GPT-5.3-Codex achieved significantly higher scores in exploiting vulnerabilities than its predecessor. However, performance in detection and patching remains less robust, indicating ongoing challenges. Researchers emphasize that while EVMbench advances measurement of AI in blockchain security, it does not fully reflect the complexity of live systems. As AI’s role in smart contract security increases, ongoing evaluation in realistic environments is considered vital.

