AI Hacking Agents Reach 69.3% in New Test, Exposing a Growing Security Automation Risk

A new academic study found AI-powered cybersecurity agents achieved penetration-test success rates between 10.7% and 69.3% when tested against 300 target servers without prior knowledge of vulnerabilities. The research highlights growing concerns about autonomous offensive-security capabilities, even if current AI models fall short of fully replicating human expert-level attacks.
A June 11 academic paper evaluated 19 open-weight and proprietary large language models to measure their autonomous offensive-security capabilities. The test involved 300 target servers, with AI agents receiving only general-purpose cybersecurity tools and no prior knowledge of vulnerabilities. Success rates ranged from 10.7% to 69.3%, demonstrating improved but still limited ability to complete multi-step attack workflows like reconnaissance, exploitation, and adaptation. The study differentiated between simplified tests—such as writing exploit code—and real-world scenarios requiring continuous action. Researchers designed two-tier environments: Tier 1 included one secure and one vulnerable service, while Tier 2 added three secure services to increase complexity. Agents had to identify attack surfaces, interpret results, and attempt exploitation independently, mirroring phases documented in the MITRE ATT&CK framework. While the highest success rate was 69.3%, the study clarifies that AI agents cannot yet fully compromise arbitrary real-world networks. Earlier research, like CAIBench, showed models scoring around 70% on security knowledge tests dropped to 20–40% in multi-step attack-and-defense scenarios. The new benchmark focuses on automation reducing operational barriers, as AI agents can test multiple targets continuously without human intervention. The paper warns that cybersecurity risks may rise even before AI reaches expert-level capability. Attackers could deploy large numbers of imperfect agents to exploit poorly secured systems opportunistically. Challenges like tool-use correctness, long-horizon reasoning, and safeguards remain unresolved, suggesting capability and reliability will improve unevenly. The study’s methodology contrasts with prior evaluations, such as HonestCyberEval, which used synthetic vulnerabilities in an Nginx web-server repository. By testing full attack workflows, the research aims to reflect real-world conditions more accurately, though it stops short of proving agents can execute every stage of a full attack chain.
This content was automatically generated and/or translated by AI. It may contain inaccuracies. Please refer to the original sources for verification.