Cloud vs. On-Device AI Security

Bene & Domi

8 min

AI Architecture

What "cloud-based AI security" actually means

When a vendor sells "AI security" or an "AI gateway", the architecture is almost always the same. Your traffic to api.openai.com is routed first to the vendor's cloud. There, the prompt is parsed, scanned for sensitive data and prompt injections, and either passed through or blocked. The response from OpenAI takes the same path back: through the vendor, then to your user.

The vendor cloud needs to see prompt content to inspect it. That is a hard requirement of the architecture, not an optional setting. If the prompt is encrypted to the vendor, the vendor cannot scan it. So the prompt is in the clear at the inspection point. So the prompt is in the vendor's logs and memory, at least for the duration of the request, and often longer for forensic and retraining purposes.

This is the design pattern across the cloud-based segment of the market. Lakera, Prompt Security, Robust Intelligence, and others operate variations of it. The trade-off is not hidden — it is in the architecture diagrams. It is just not always made explicit in procurement.



Four costs that come with the cloud-routing architecture

1. A second copy of your data, in a place you do not own

Every prompt now exists on at least three systems: the user's device, the LLM provider, and the security vendor. Two of those are outside your perimeter. If the security vendor is breached, every inspected prompt your organisation generated is exposed. The August 2025 incidents at multiple SaaS security vendors (no naming, but several have published post-mortems) made this concrete: customers found that the firms holding the keys to their security were themselves compromised.

A common counter-argument is "the vendor is SOC2 certified". SOC2 attests to controls, not to outcomes. Two SOC2-certified security vendors were breached in 2025. Certification is necessary, not sufficient.

2. Latency the user pays in real time

The added round-trip to the vendor cloud is typically 30 to 200 milliseconds, depending on geography. For a chat interaction it is annoying. For an interactive coding tool — Cursor, Copilot, Continue — it is the difference between adoption and silent disabling. Engineering teams that hit the latency tax routinely turn the security tool off, file a ticket against the security team, and revert to direct provider connections within a week.

3. A new single point of failure

Cloud-routed AI security creates a hard dependency. If the vendor is down, AI is down. For a tool category that is becoming load-bearing inside engineering, support, sales, and analyst teams, that is a non-trivial business-continuity risk. The standard mitigation — bypass the gateway when it is unreachable — defeats the security purpose.

4. GDPR and works council friction in the EU

Per-user prompt logging at a US-based cloud vendor is a long conversation with the DPO and the works council in any German enterprise. The conversation is not unwinnable, but the data-processing addendum, the international transfer assessment, the deletion guarantees, and the audit rights are real procurement work. Many cloud-based AI security deployments stall here. The deals close and the tools are bought, but the rollout to engineering takes nine months because the legal review is not done.



On-device AI security: the architectural alternative

The other architectural posture is to run the entire inspection pipeline on the endpoint. The agent intercepts the AI request before it leaves the device, classifies it, applies policy, and either lets it through, modifies it, or blocks it. The LLM provider is the only external party the prompt is ever sent to.

The trade-off matrix flips:

Property

Cloud-routed AI security

On-device AI security

Where prompt content lives

Endpoint + provider + vendor

Endpoint + provider

Works offline / air-gapped

No

Yes

Single point of failure

Yes (vendor cloud)

No

GDPR posture

Configuration-driven

Architectural

Coverage of local LLMs

None

Full

Coverage of MCP / tool calls

None (process-local)

Full

Per-user prompt logging

Default

Optional, off by default

The honest cost of on-device security is different: you need an agent on every device, which is a real endpoint-management decision. But that is a one-time deployment problem, solved by MDM, and unlike cloud-routing it is recoverable if you change vendors.



Why "we have a cloud-only approach" is not always a deal-breaker — but is when it is

There are real workloads where cloud-routing is fine: pure chatbot interactions, low-sensitivity SaaS, jurisdictions without strict data-residency requirements. For those, the architecture works.

The deal-breakers are concrete:

  • Air-gapped or sovereign deployments. Defence, critical infrastructure, intelligence agencies. The agent is outside the network; there is no cloud to route through. On-device is the only option.

  • Data with statutory localisation requirements. Healthcare records under national equivalents of HIPAA, certain financial records, certain legal work product. Routing them through a US vendor cloud is either prohibited or requires more legal work than the security gain is worth.

  • Real-time interactive workflows. Coding assistants, sales tools that draft replies live, analyst tools running over confidential spreadsheets. The latency budget is too tight for a cloud round-trip.

  • MCP and agentic AI workflows. MCP traffic happens between local processes, before any network. A cloud gateway cannot see it at all, regardless of how good its inspection is.

  • Self-hosted local LLMs. A growing share of enterprise AI runs locally — Ollama, llama.cpp, on-prem deployments. A cloud gateway has nothing to inspect. On-device has the full picture.

If any of these apply, cloud-routed is not just a sub-optimal choice. It is structurally insufficient.



What to ask vendors during procurement

Five questions that surface the architectural differences quickly:

  1. Where is prompt content inspected, and on whose infrastructure? "On the endpoint" and "in our cloud" are the only two valid answers. Hybrid claims usually mean cloud with a thin client.

  2. What happens to my prompts if your service is down? The honest answer is either "your AI stops working" or "the gateway fails open and you have no security during the outage". Both are fine to know up front.

  3. Can the system inspect MCP server tool calls? If the vendor does not know what MCP is or claims it is "out of scope", you have already learned something important about their roadmap.

  4. Can the system inspect a self-hosted local LLM? Cloud gateways cannot. On-device tools can. Useful filter.

  5. What is the per-request latency you contractually commit to? The number tells you what their architecture is. Sub-millisecond means on-device. 50 ms or higher means cloud-routed, regardless of marketing claims.



How Patronus Protect implements the on-device pattern

Patronus is an AI firewall that runs on the endpoint.

The pipeline is local: intercept → analyse locally → enforce policy.

AI traffic is identified in single-digit milliseconds.

Inspection covers app traffic, browser sessions, IDE plugins, MCP servers, and tool calls inside requests.

PII is redacted locally. Policies are enforced before the request leaves the device.

No prompt content is routed to a vendor cloud, ever. Compliance metadata that does leave the device is minimal, structured, and exportable to existing SIEMs (Splunk, Elastic, Datadog) for audit.

The architectural commitment matters because it determines what is possible. A cloud-routed product cannot become air-gapped without rebuilding. An on-device product can choose to add cloud features (dashboards, fleet management) without compromising the data path.



Frequently asked questions

Is on-device AI security less effective than cloud-based? The detection quality depends on the models used, not on where they run. On-device deployments with state-of-the-art classifiers (BERT, LightGBM variants, small fine-tuned LLMs) match the detection performance of cloud-based systems for the threat categories that matter: prompt injection, PII, jailbreak attempts, policy violations.

Does on-device mean I lose visibility into what is happening across my fleet? No, but the data flow is different. Each agent maintains a local audit log. Compliance metadata (categories of events, counts, policy decisions) flows to a central dashboard (if requested). Prompt content stays local. The compliance team gets the same answer to "what AI happened in the org last week" without the prompt corpus leaving the endpoints.

What about the agent itself — does it phone home? Well-designed on-device tools phone home only for licensing, model updates, and the metadata stream above. The data plane (prompts, responses, tool calls, MCP traffic) stays local.

Is cloud-based AI security ever the right choice? Yes, when the workload is genuinely cloud-only, the data is genuinely low-sensitivity, and the team has no air-gapped or local-LLM exposure. The architecture is fine when the assumptions hold. The problem is buying the architecture for workloads where the assumptions do not hold.

How does this interact with my existing CASB and DLP? The on-device AI firewall complements them rather than replaces them. CASB sees the SaaS landscape. DLP sees structured data movements. The AI firewall sees AI-specific traffic at the granularity those two miss. In practice the three coexist.

Patronus Protect - on-device AI Security

Patronus Protect - On-device AI firewall — see and control AI traffic, locally | Product Hunt

© 2026 Casdo Labs · All rights reserved.

Patronus Protect - on-device AI Security

Patronus Protect - On-device AI firewall — see and control AI traffic, locally | Product Hunt

© 2026 Casdo Labs · All rights reserved.

Patronus Protect - on-device AI Security

Patronus Protect - On-device AI firewall — see and control AI traffic, locally | Product Hunt

© 2026 Casdo Labs · All rights reserved.