What is Anthropic's Zero Trust for AI Agents guide?

A 36-page framework Anthropic published in 2026 that applies Zero Trust principles to agentic AI deployments. Three capability tiers (Foundation, Enterprise, Advanced), a Part III on security controls (identity, access, observability, behavioral monitoring, input and output controls, integrity and recovery, governance), and a Part IV with eight implementation phases. Read it. It is the right framework. Most of this post is about what it leaves to you.

What's the 'impossible vs tedious' design test?

Anthropic's framing: 'does this make the attack impossible, or just tedious?' Prefer controls that remove a capability over controls that throttle it. Friction-only controls (rate limits, non-standard ports, SMS MFA) fail against an attacker willing to grind. Applied to agent identity: an ephemeral credential is impossible to steal because it doesn't persist; a long-lived credential with rotation is just tedious to misuse.

What's the deployer gap in the Anthropic framework?

Anthropic describes the controls. The framework, by design, doesn't tell you who owns the policy per agent, how to enforce intent at SaaS endpoints with coarse OAuth scopes, what runtime evaluation catches when input sanitization misses, or how the policy survives the engineer who built the agent leaving. Those are deployer questions and they land on the security team's desk during rollout.

What does Anthropic say about MCP servers?

Treat them as supply chain risks. Run the MCP server yourself on immutable infrastructure after verifying the code. Cryptographically sign it. The first documented in-the-wild malicious MCP server impersonated a legitimate email service and silently copied every sent message. The guide is clear that tool poisoning, rug pulls, and tool chaining attacks are now real threat classes.

What's left for the deployer in Phase 3 (define agent boundaries)?

Anthropic says document approved and prohibited actions. The deployer question: how to enforce those at the API when the SaaS provider has coarse OAuth scopes. Gmail doesn't have a 'send only to addresses in your CRM' scope. The boundary the guide tells you to define has to be enforced somewhere other than the credential, because the credential can't carry that level of nuance. That somewhere is the runtime layer above the credential.

What's left for the deployer in Phase 8 (measure what matters)?

Anthropic recommends dwell time, coverage, detection speed, explainability, and behavioral conformance. All right. The leftover question is what you measure agent behavior against. The guide says establish baselines. Baselines compared to what? Without a declared intent per session, the baseline is whatever the agent has done historically, which makes it a record rather than a standard.

How does intent-bound governance fit Anthropic's framework?

It's the operational layer above the controls. Anthropic's framework gives you identity verification, scope enforcement, observability, and monitoring. Intent is what tells those controls what 'in scope' means for a given session. The credential carries identity; the session carries why the credential is being used. Runtime evaluation against declared intent catches actions that satisfy the framework's individual controls but drift away from what the session was supposed to do.

Should we wait to deploy agents until we've implemented the full framework?

Mostly no. Agents are already in production at most companies. Walking one production agent through the eight phases as if you were writing your one-pager is faster than waiting for full coverage. The gaps that surface in that exercise are the prioritized work.

Where does Iden fit relative to the framework?

Iden is the runtime governance layer above identity, scopes, and credential brokering. We register agents as first-class identities, evaluate sessions against declared intent, scope credentials to the session, and capture an audit trail that closes the attribution gap. We don't replace the framework; we resolve the deployer questions the framework leaves open. SPIFFE answers who the agent is. The IdP issues the token. Iden answers whether what the agent is doing right now aligns with what its session was supposed to do.

Anthropic Zero Trust for AI Agents: the deployer gaps the framework leaves open

You finished the Anthropic Zero Trust for AI Agents guide on a Tuesday. Thirty-six pages, three tiers, eight implementation phases. The CISO emailed three people including you with the subject "we should align on this." Now it's Wednesday and you have to write the one-page version that says how your team will actually deploy it against the four agents engineering already shipped.

The framework is right. The Foundation, Enterprise, and Advanced tier model is right. The "impossible vs tedious" design test is the most useful single line in any agent security document we've read this year. None of that is the problem.

The problem is that the framework, by design, is about controls. It tells you what good looks like and where to invest. It doesn't tell you who decides, when intent is wrong, what enforces a policy at the SaaS endpoint, or how the policy survives the engineer who built the agent leaving the company. Those questions are deployer questions. They land on the security team's desk during rollout, not before it.

Here's what the eight implementation phases each leave you to figure out.

Phase 1. Identify requirements

The guide says align the security, legal, compliance, and business teams on goals and constraints up front. What it doesn't say: who owns the intent policy per agent, on an ongoing basis, after the kickoff meeting. The IT lead who deployed it? The engineering manager whose team built it? The compliance officer whose name is on the SOC 2? In practice the policy gets written once during deployment and updates rarely until an incident forces it. Pick an owner before the first deployment, not after the first scope-expansion request.

Phase 2. Manage supply chain risks

AI-BOM and OpenSSF Scorecard are the right primitives. Anthropic is also direct that frontier models are very good at recognizing signatures of unpatched dependencies, so the attacker's economics here are bad for defenders. What the guide leaves open: what happens between deployments when MCP servers update their tool config and your AI-BOM goes stale. The supply chain isn't a CI step. It's a continuous surface. Treat new MCP tools the way you treat new third-party APIs in your network. They get a review before they go live, not a Slack post announcing they already did.

Phase 3. Define agent boundaries

This is where the framework's abstraction becomes the deployer's burden. The guide says document approved and prohibited actions. Good. Now enforce that at the API. The OAuth scope gmail.send doesn't have a "send only to addresses your CRM contains" variant. The SaaS providers haven't shipped the granular scopes the framework assumes. So the boundary the guide tells you to define has to be enforced somewhere other than the credential, because the credential can't carry that level of nuance.

This is the deployer's call. A gateway layer. A per-action approval. An intent-bound session that constrains the credential's effective scope at runtime. The framework is silent on which.

Phase 4. Defend against prompt injection

Input isolation, constitutional classifiers, spotlighting. All correct on the model side. Anthropic cites Microsoft's research showing spotlighting reduces indirect injection success from over 50% to under 2%. That's real progress.

The leftover deployer question: what happens when the prompt got through, the agent is now doing something, and the action looks well-formed to the API but is wildly out of scope for the original task. Boundary defenses are necessary. They're not sufficient. Runtime evaluation against the original declared intent is the second line. Anthropic gestures at this with continuous authorization at the Advanced tier, but doesn't operationalize it.

Phase 5. Secure tool access

Tool allow-listing, parameter validation, sandboxed execution. Strong recommendations, and Claude Code natively supports most of them via settings.json and hooks. The deployer gap shows up when the tool is a third-party SaaS API the agent calls directly.

Your sandbox doesn't extend to Salesforce. Your allowlist doesn't constrain what the agent does inside the trading platform once the request lands. The guide's tool-access controls assume you own the runtime. For the majority of agents in production today, the most valuable tools aren't sandboxed and never will be by you.

Phase 6. Protect agent credentials

Short-lived identity-provider-issued credentials as baseline. Hardware-bound credentials for production. Per-agent credentials, never shared. STS-style brokering. All right. Anthropic puts it bluntly: static API keys and shared service-account passwords are no longer a legitimate entry point, not even at Foundation.

The unanswered question is the issuance policy. Who decides what scope this agent can request, at what frequency, for this task. The mechanism (STS, certificate authority, OAuth client credentials) is the easy part. The policy is what makes the issuance an act of governance rather than a credential vending machine. Most teams ship the mechanism and skip the policy, which is how the credential becomes a long-lived secret in a shorter wrapper.

Phase 7. Safeguard agent memory

Memory isolation, context integrity validation, retention policies. Sound. The framework assumes the agent's memory is yours to control. For most teams it isn't. The MCP server's context is owned by whoever runs the MCP server. The vector store backing RAG is often a managed service. Your retention policy stops at the boundary of what you actually operate. Past that, you're trusting a vendor's posture.

Map this out before you assume the policy applies end to end. The agents whose memory you can't reach are the agents whose blast radius extends past what you control.

Phase 8. Measure what matters

Dwell time, coverage, detection speed, explainability, behavior. The metrics are right.

The leftover question: what you measure agent behavior against. The guide says establish behavioral baselines. Baselines compared to what? If your agents don't have a declared intent per session, the baseline is whatever they've done historically, which makes the baseline a record of behavior rather than a standard. Without intent as a primitive, behavioral monitoring is descriptive, not normative. You can tell when the agent does something it hasn't done before. You can't tell when it does something it was never supposed to.

What the gaps share

Eight phases, eight different shapes of unanswered question. The shared shape is the gap between what the agent is permitted to do and what the agent is doing right now, for what reason. The framework gives you tools to enforce permission. The runtime question, the why-this-action-right-now question, is what most of the gaps reduce to.

This is the layer Iden builds. Intent declared at session start. Runtime evaluation against it. Scoped credentials issued per session. An audit trail that captures the chain. Not as a replacement for the framework; as the operational answer to the framework's open questions. Anthropic gives you the controls. The controls don't deploy themselves.

What to do this week

Pick one agent already in production. Walk it through the eight phases as if you were filling in your own one-pager. The gaps will be obvious within an hour. The first three will be questions of ownership and policy. The next three will be questions of enforcement at a boundary you don't fully control. The last two will be questions of what you measure against. Write down the answers. That's the brief you take to your CISO on Friday.

Anthropic shipped the framework. Here's what it leaves you to figure out.