The Defender’s Dilemma: Why AI on Both Sides Makes Independent Security Non-Negotiabl

In June 2026, Anthropic published the findings from a year-long analysis of 832 accounts banned from its platform for malicious cyber activity. The report mapped those cases against MITRE ATT&CK, the security community’s most referenced framework for understanding how attackers operate. The conclusions were significant, and not only for security teams.

For any organisation building and shipping software, the findings reframe a question that is easy to underestimate. It is not just that AI is changing how code gets written. It is that AI is simultaneously changing how that code, and the systems running it, get attacked. Both sides of that equation are moving fast, and the consequences of falling behind on either side extend beyond the engineering team. A security gap that surfaces during a client audit, a compliance review, or after an incident carries costs in contracts, reputation, and recovery time that are difficult to absorb quickly. The question is whether your security programme is keeping pace with either of them.

What the research actually found

The Anthropic report identified three conclusions that matter for how organisations think about application security, risk governance, and software assurance.

First, AI is making attackers meaningfully more dangerous. Not uniformly across the attack lifecycle, but specifically in the later, more complex stages, the parts that previously required genuine technical expertise. Activities like lateral movement inside a compromised network, account discovery, and privilege escalation, techniques that used to separate sophisticated actors from less capable ones, are now being performed with AI assistance by actors who would not previously have been capable of them. In the first six months of the study period, 33% of actors were classified as medium risk or higher. By the second six months, that figure had risen to 56%.

Second, the signals security teams have traditionally used to assess an attacker’s risk level are becoming unreliable. The number of techniques an actor uses, historically a reasonable proxy for sophistication, no longer correlates with how dangerous they are. AI performs technical tasks on an actor’s behalf, so a less skilled attacker can now employ as many techniques as a skilled one. The more meaningful differentiator is becoming where in the attack chain AI is being applied, and whether the attacker has built the scaffolding to let the model operate with minimal human input.

Third, the MITRE ATT&CK framework, which underpins much of how the industry classifies and responds to threats, does not yet capture the behaviours that make AI-enabled attackers most dangerous. Agentic orchestration, where a model chains together discrete stages of an attack and executes them sequentially with minimal human intervention, has no ATT&CK identifier. Yet that is precisely the pattern observed in a state-sponsored cyber espionage operation that Anthropic disrupted in November 2025, where Claude Code was manipulated into attempting to infiltrate targets globally with little human oversight.

The threat landscape is not waiting for security frameworks to catch up. Attackers are already operating in territory that the tools defenders rely on were not designed to map.

The same dynamic applies to the defensive side

There is a parallel conversation happening on the defensive side of application security that this research makes more urgent. AI coding tools and AI-powered security features are proliferating rapidly, and the instinct to let them carry more of the security function is understandable. They are capable, they are fast, and they are already embedded in how many development teams work.

But the Anthropic research illustrates something that applies equally to both sides of the AI-in-security story: capability and independence are different things. On the attack side, AI gives less capable actors access to techniques they could not previously execute. On the defensive side, AI gives development teams a sense of security coverage that may not always reflect their actual posture.

The specific risk is structural. An AI tool that assists in writing code and an AI tool that reviews that code for vulnerabilities may share foundational training assumptions. What looks normal to one can look normal to the other. Systematic weaknesses that emerge consistently from AI-assisted development, the patterns the model has learned to produce, may also be the patterns the model is least likely to flag as suspicious. The reviewer and the writer are not truly independent of each other.

This is also a separation of duties problem. Regulations and governance frameworks increasingly require that the entity verifying code is independent of the entity that produced it. Relying solely on an AI tool to review what an AI tool generated creates a structural conflict of interest, regardless of how capable either tool is. Independence is not a nice-to-have. For organisations operating under formal compliance frameworks, it is a documented requirement.

Meanwhile, the attack surface that matters most is increasingly not at the code level alone. The Anthropic report highlights that advanced attackers are focusing on post-compromise techniques: lateral movement, account discovery, and privilege escalation. These are not simply vulnerabilities in a function or a library. They are gaps in how systems are architected, how access is governed, how environments are monitored, and how quickly anomalous behaviour is detected and contained. No code-level AI tool, however capable, is designed to address all of that on its own.

Why this makes independent application security more important, not less

It would be easy to read the Anthropic research as an argument for more AI on the defensive side: if attackers are using AI, defenders should too. That conclusion is not wrong, but it is incomplete.

What the research actually demonstrates is that the gap between what AI can do and what a structured security programme needs to do is growing on both sides simultaneously. Attackers are using AI to reach further into systems, operate more autonomously, and execute techniques that used to require real expertise. That means the things a structured security programme needs to do, detect anomalous behaviour deep in the attack chain, monitor the full dependency and infrastructure surface, produce auditable evidence for compliance, and enforce governance at portfolio scale, have become more consequential, not less.

This is not a gap that developer-facing AI security tools are designed to close. They operate at the code layer, at the moment of writing. The attack patterns the Anthropic research identifies, lateral movement, privilege escalation, and agentic orchestration across a compromised environment, sit largely outside that scope. Catching them requires a different kind of programme: one that tests how applications behave at runtime, monitors the full dependency surface continuously, enforces risk policy across every project in a portfolio, and produces findings structured and repeatable enough to hold up under scrutiny.

This is where independent application security platforms become critical: not as a replacement for development tools, but as a governance and assurance layer above them. They sit outside the AI-assisted development toolchain entirely, which means they do not carry its assumptions, its blind spots, or its limitations. They test what the system actually does, and they produce the kind of structured, repeatable evidence that compliance frameworks, enterprise procurement teams, and security leadership all require.

That is the kind of programme that platforms like Veracode are built to support. Veracode’s Application Risk Management platform approaches security as a continuous, governed discipline rather than a point-in-time check. Static analysis runs on every commit and maps findings to structured vulnerability taxonomies. Dynamic analysis tests applications under real-world conditions, surfacing runtime behaviours that no code scanner sees. Software composition analysis monitors the open-source dependency surface continuously, flagging new vulnerabilities against components already running in production. And external attack surface management extends visibility beyond the application itself to the broader environment it operates in.

What makes this relevant in the context of the Anthropic research is not just the breadth of coverage. It is the independence. A platform operating outside the development toolchain does not share the assumptions of the AI tools producing the code. It does not know or care how the code was written. It tests what the system does, and it flags what it finds regardless of how normal that behaviour might appear from inside the codebase. That independence is precisely what the Anthropic findings point toward: in an environment where both code generation and code review are increasingly AI-assisted, having a layer that sits outside that ecosystem is not a nice-to-have. It is the assurance layer the rest of the programme rests on.

The questions worth asking about your own programme

The Anthropic research is a useful prompt for a harder internal conversation. If AI is enabling attackers to operate deeper in systems, more autonomously, and with less technical overhead than before, the relevant question is not whether your team is using AI security tools. Most are. The question is what sits underneath them, and whether leadership has clear visibility into the business risk those tools do and do not cover.

Code-level review, even with AI assistance, was designed to catch vulnerabilities before they ship. That is a valuable function, and it is a narrow one. The attack surface the research describes extends well beyond the code layer into runtime behaviour, infrastructure configuration, access governance, and the dependency surface that modern applications are substantially built on. A programme that covers only the code layer may still leave material risks untested, undocumented, or invisible until they become business issues.

An honest assessment asks a few direct questions. Does your testing surface extend beyond code to cover runtime behaviour, infrastructure, and open-source risk? Do your governance workflows give leadership visibility across the full portfolio rather than individual project findings? Critically, is the evidence your programme produces governance-sufficient: not just logs retained by an AI tool, but signed, policy-driven, independently reviewable records that satisfy frameworks like PCI-DSS or ISO 27001? And is there a layer in your programme that operates independently of your development toolchain, with the scope to catch what AI-assisted development is most likely to miss?

These are the questions a mature application security programme is built to answer. For organisations working through them, Veracode offers a starting point: a structured conversation about where current coverage ends and what it would take to extend it in a way that reflects how the threat landscape is actually evolving.

AI is accelerating both sides of the security equation. The organisations that treat independent, structured security as the foundation for AI tools, rather than something AI tools replace, are the ones building on ground that holds.

References

Anthropic. “What we learned mapping a year’s worth of AI-enabled cyber threats.” Anthropic News, 2026. https://www.anthropic.com/news/AI-enabled-cyber-threats-mitre-attack