Agentic AI Broke Your Security Model

Domi

11 min

AI Security

What does "agentic AI" actually mean for security?

An AI agent is an autonomous software entity that pursues a goal across multiple systems, makes decisions, and only involves humans when necessary. Practically, in 2026, this means coding agents (Claude Code, Cursor, Codex) that read repos and execute tools, ops agents (Azure SRE Agent, Microsoft Copilot Studio agents) that manage cloud infrastructure, and a long tail of business agents that summarise emails, file tickets, update CRM records, and draft replies.

The shift from "AI assistant" to "AI agent" is the shift from "the model says something, the user does the thing" to "the model decides what to do, the model does the thing". The user is no longer the action point.

That breaks four assumptions every traditional security tool was built on.

  1. Identity is human. IAM systems were designed for users who authenticate once and act under one identity. Agents authenticate continuously, often under delegated human credentials, and act faster than any human review can keep up with.

  2. Actions are deliberate. DLP and audit assume that when data moves, someone meant to move it. An agent moves data because a piece of content told it to, possibly without the principal user's knowledge.

  3. Privilege is static. RBAC assigns roles and reviews them quarterly. An agent with the same role as a human user has dramatically more reach because it operates 24/7 across every connected system simultaneously.

  4. Speed is human. SOC alerting thresholds, rate-limit policies, and review queues were tuned for human-pace activity. Anthropic's analysis of the GTG-1002 attack documented an autonomous agent operating at thousands of requests per second, well below most alerting baselines.

The four assumptions break together. That is why "agentic AI security" is not a feature you bolt onto an existing tool. It is a different control surface.


Four 2025/2026 incidents that already happened

Not hypothetical. Each of these has a public disclosure and, in most cases, a CVE.

GTG-1002 (Anthropic, September 2025)

Anthropic detected a Chinese state-sponsored group that hijacked Claude Code instances to conduct autonomous cyber espionage against roughly 30 targets in defence, energy, and technology. The AI handled 80 to 90% of tactical operations independently, discovering and exploiting vulnerabilities at thousands of requests per second. The operators bypassed Claude's safety guardrails by social-engineering the model itself, claiming to be employees of legitimate cybersecurity firms conducting authorised testing.

This was the first publicly documented case of a cyberattack largely run without human intervention at scale.

CVE-2026-32173 — Azure SRE Agent

A flaw in Microsoft's Azure SRE Agent exposed live command streams over an unauthenticated WebSocket endpoint. Any Entra ID account holder could observe what the agent was doing in real time. CVSS 8.6.

The lesson is architectural: when an agent runs continuously and exposes its operations as an event stream, the stream itself becomes a target. Traditional pen-test methodology does not cover "the agent's heartbeat is a credential".

Mexican Government Breach (December 2025 to February 2026)

A single attacker used Claude Code and GPT-4.1 to breach nine Mexican government agencies, including the federal tax authority, the civil registry of Mexico City, and the electoral institute. The agents accelerated reconnaissance, vulnerability discovery, and lateral movement to a pace defenders could not match.

This is the operational case for "agents change the speed equation". The attacker did not need a team. The agent was the team.

McKinsey "Lilli" Red Team Compromise

In a controlled red-team exercise, McKinsey's internal AI platform Lilli was compromised by an autonomous agent that gained broad system access in under two hours. Documented by Bessemer Venture Partners as the canonical demonstration of how quickly agentic threats can outpace human response times.

Two hours from "agent goes live" to "agent has the keys". The traditional incident-response playbook assumes you have days. You do not.


The four attack patterns that matter

Distilled from OWASP's Top 10 for Agentic Applications and the late-2026 incident data:

1. Tool misuse and privilege escalation

Agents are typically over-privileged. The single most predictive risk factor is over-privileged access — Teleport's 2026 research found a 4.5x higher incident rate in organisations with over-privileged AI systems. An agent that "needs to read files" is given write access "just in case". An agent that "needs to query the database" is given the admin credentials. When the agent is hijacked, the blast radius is whatever the credentials cover.

520 incidents in 2026 were classified as Tool Misuse or Privilege Escalation, the largest single category in the Stellar Cyber breakdown.

2. Goal hijacking through indirect prompt injection

The agent's goal is set by a system prompt and a user prompt. Anything the agent reads after that — a Jira ticket, a README, a customer email, a PDF — can contain instructions that override the goal. The CVE record for 2025 and 2026 includes EchoLeak (CVE-2025-32711), CurXecute (CVE-2025-54135), and CVE-2025-53773, all variants of the same pattern: untrusted content becomes part of the system prompt.

For an agent with tool access, goal hijacking is not "the agent gives a bad answer". It is "the agent calls the destructive tool, with the attacker's parameters".

3. Memory poisoning and supply-chain compromise

Agents often have persistent memory or access to long-running context (RAG indexes, vector databases, MCP servers). Memory Poisoning and Supply Chain attacks, though less frequent than tool misuse, carry disproportionate severity and persistence risk. Once an agent's memory is poisoned, every subsequent decision is influenced. The compromise survives session boundaries.

The early-2026 ClawHavoc supply-chain campaign showed the variant: malicious packages disguised as agent skills, distributed through community repositories. Once installed, they exfiltrated data through the agent's normal operating channel.

4. Identity confusion and confused deputy

Agents act under delegated credentials. When agent A calls agent B to do something, the audit log shows agent B doing it under A's credentials, under a human's credentials, or under nobody's. NIST AI Agent Standards Initiative specifically acknowledges that existing identity frameworks were never designed for this type of entity. Only 21.9% of organisations currently treat AI agents as independent, identity-bearing entities with their own access controls.

Without per-agent identity, accountability collapses. Every action is "the user did it" because the agent borrowed the user's credential. The user did not.


What real agent-aware security has to do

Five properties. None of them are nice-to-haves at this point.

Per-agent identity and ephemeral credentials

Agents need their own identities, not delegated human ones. Credentials need to be short-lived, scoped to a single task, and revocable in real time. NIST is publishing on this. AWS, GCP, and Azure are slowly shipping primitives. The transition to agent identity will take years; the security gap is now.

Tool-call inspection at the source

The unit of agent action is the tool call. Every tool call has to be inspectable: which tool, with what arguments, in response to what context. A SOC that logs "agent X did things" is useless. A SOC that logs "agent X called destructive_action(target=Y) at timestamp Z, in response to content fetched from source W" can actually investigate.

This is the architectural reason MCP-aware AI firewalls matter. MCP is the de-facto tool-call protocol for the major LLM clients in 2026. Inspecting MCP traffic at the endpoint, before the call executes, is how tool calls become a real audit primitive.

Real-time enforcement, not after-the-fact alerting

By the time the SIEM correlates the alert, the agent has already done what it was going to do. The only effective control is in-line: inspect the tool call, decide allow/modify/block, then execute. Sub-second latency is required because agents operate at machine speed and any delay breaks the workflow.

Behavioural baselining for agents specifically

Human-user behavioural detection (UEBA) does not transfer to agents. Agents operate continuously, in patterns no human would. The baseline has to be agent-specific: this agent normally calls these tools, in this order, at this rate. Deviations are the signal. CVE-2026-32173 (the Azure SRE Agent flaw) was caught by exactly this kind of behavioural anomaly, not by a static signature.

Human-in-the-loop checkpoints for irreversible actions

Some actions are recoverable. Some are not. An agent should never transfer funds, delete data, or change access control without explicit human approval. The category boundary matters more than the volume: ten thousand routine reads per day are fine, one DROP TABLE is the end of the conversation. The control is to treat the irreversible category specially.


Why the existing security stack does not deliver this

Short version, control by control:

  • IAM and PAM: Built for humans. Static credentials, quarterly reviews. The 21.9% adoption number for agent identity tells you everything about how far this has shifted.

  • CASB: Sees SaaS connections. Does not see what happens inside an agent's tool-call sequence. The agent's MCP traffic is local to the device.

  • DLP: Sees data movement at known channels. Misses agent-initiated data flows that move via tool calls between SaaS the user has approved.

  • EDR: Watches processes for malicious behaviour. An agent doing legitimate-looking things on behalf of a user does not match any known signature.

  • SIEM: Correlates alerts after the fact. Agents act faster than the correlation window. Useful for forensics, not for prevention.

  • Cloud AI gateways: See prompts and responses on the network path to the LLM provider. Do not see local MCP traffic, do not see tool calls inside requests, do not see agent-to-agent traffic on the same device.

The structural gap is the same as for Shadow AI more broadly: agents act at the application layer, on the endpoint, in real time. Visibility and control have to live there.


How to start in the next 30 days, agent-specifically

If you have agents in production and no agent-aware control:

  1. Inventory deployed agents. Across managed laptops and servers, identify every running agent (Claude Code, Cursor, Codex, Copilot Studio, internal agents). Most organisations underestimate by 3x.

  2. Map per-agent privileges. For each agent, list every credential, MCP server, and tool it has access to. The list will be longer than expected. Trim it.

  3. Force-add a human-in-the-loop checkpoint for irreversible actions. Most agent frameworks support this; few enable it by default. Turn it on, even if it costs friction.

  4. Log every tool call, locally. Even without a real agent firewall, a local log of agent tool calls is better than nothing. Cursor and Claude Code can emit this to a file. Do it.

  5. Pilot an agent-aware control on one team. Specifically look for: tool-call inspection, MCP visibility, real-time enforcement, on-device operation. This is where Patronus Protect's Q3 2026 roadmap (full agentic protection) lands.



Frequently asked questions

Is agentic AI just a more capable assistant, or genuinely a new category? Genuinely new. The defining property is autonomy: the agent decides which tools to call, in what order, based on content it reads. An assistant says "you should run X". An agent runs X. The threat model differs accordingly.

Can I just turn off agent autonomy and require approval for every action? You can, and the result is a slower assistant. The productivity case for agents collapses if every tool call requires approval. The realistic target is "approval for irreversible actions, observability for everything else".

Does Zero Trust for humans solve agentic AI? Partially. Zero Trust principles (no implicit trust, continuous verification, least privilege) apply. The implementation is different. ZT for agents requires per-agent identity, ephemeral credentials, and behavioural baselining at agent speed.

How does this interact with the EU AI Act? Article 26 deployer obligations apply to high-risk AI systems, which include many enterprise agentic deployments. The deployer log requirement (six-month retention, sufficient to reconstruct what the system did and why) maps directly to "log every tool call, with context". Agent-aware logging is also EU AI Act-aware logging.

What about agent-to-agent attacks? Open research. The general pattern (agent A's output becomes agent B's input, B is hijacked, B influences A) is documented but not yet widely exploited in production. Mitigations require provenance tagging across agent boundaries, which is not standardised yet.

Patronus Protect - on-device AI Security

Patronus Protect - On-device AI firewall — see and control AI traffic, locally | Product Hunt

© 2026 Casdo Labs · All rights reserved.

Patronus Protect - on-device AI Security

Patronus Protect - On-device AI firewall — see and control AI traffic, locally | Product Hunt

© 2026 Casdo Labs · All rights reserved.

Patronus Protect - on-device AI Security

Patronus Protect - On-device AI firewall — see and control AI traffic, locally | Product Hunt

© 2026 Casdo Labs · All rights reserved.