The UK AI Security Institute has released a landmark evaluation comparing OpenAI’s GPT-5.5 with Anthropic’s Mythos Preview, offering new insight into how cutting-edge AI models perform in real-world cybersecurity scenarios. The results highlight a rapidly evolving threat landscape, where AI systems are now capable of executing complex cyberattack simulations at near-human—or even superhuman—levels.
GPT-5.5 vs Mythos Preview: Near-Parity in Cyber Capabilities
According to the institute’s findings, GPT-5.5 performs broadly on par with Claude Mythos Preview across a range of offensive cybersecurity tests. Both models demonstrated the ability to carry out multi-step attack chains, marking a significant leap from earlier AI systems.
Notably, GPT-5.5 became only the second AI model ever to successfully complete a full enterprise-scale cyberattack simulation—following Mythos Preview. These simulations involve dozens of steps, including reconnaissance, exploitation, lateral movement, and system takeover, tasks that would typically take a human expert many hours to execute.
Performance Metrics: GPT-5.5 Edges Ahead
While overall performance between the two models is comparable, GPT-5.5 showed a slight advantage in certain high-level tasks. On expert-tier cybersecurity benchmarks, GPT-5.5 achieved an average pass rate of 71.4%, compared to 68.6% for Mythos Preview.
This marginal lead suggests that OpenAI’s latest model may have stronger reasoning or coding capabilities in isolated scenarios, even if both systems are equally capable in long-horizon attack simulations. Researchers also noted that these improvements are part of a broader trend, rather than a breakthrough unique to a single model.
Breakthrough in Multi-Step Cyberattack Simulations
One of the most striking findings is the ability of both models to complete “end-to-end” cyberattack scenarios. Mythos Preview was the first to achieve this milestone, successfully navigating a 32-step corporate network attack simulation. GPT-5.5 has now matched this capability, confirming that such advanced performance is becoming the new standard among frontier AI systems.
These simulations replicate real-world enterprise environments, making the results particularly relevant for cybersecurity professionals and policymakers. The institute estimates that completing such tasks manually could take around 20 hours, underscoring the efficiency gains—and risks—introduced by AI.
Implications for Cybersecurity and AI Safety
The evaluation underscores a critical shift: AI models are no longer just tools for assisting cybersecurity—they are increasingly capable of autonomously executing offensive operations. This raises urgent questions about misuse, regulation, and access control.
Experts warn that as these capabilities become more widespread, the gap between attackers and defenders could narrow dramatically. AI-powered systems can identify and exploit vulnerabilities at machine speed, potentially overwhelming traditional defense mechanisms.
At the same time, these models also hold promise for defensive applications, such as automated vulnerability detection and faster incident response. The challenge lies in ensuring that access to such powerful tools is carefully managed.
Conclusion
The UK AI Security Institute’s evaluation confirms that GPT-5.5 and Mythos Preview are now at the forefront of AI-driven cybersecurity capabilities. With near-equal performance and the ability to execute complex cyberattacks, these models signal a new era in both digital defense and risk. As AI continues to evolve, balancing innovation with security will be crucial for governments, enterprises, and the global tech ecosystem.