How to design a dangerous operation: the…

Q: When an agent needs to run the same gated operation many times in a loop, does the human have to approve every single one?

Yes, because each consent is single-use and bound to `{principal, operation, params}` in canonical form, so a token minted for one set of params cannot be replayed against another. That repetition is the point: the agent-in-a-loop failure the essay names is exactly the case where a destructive path gets walked a hundred times before anyone notices, and one approval per side effect is what stops it. If the repetition is genuinely safe, encode it as a checkable fast-path condition over server-side state, never as a standing confidence flag from the caller.

When an agent does something an agent shouldn't, the failure is mundane. The model found a path where the locally optimal next step was a destructive call, took it, and now the cost is real money, lost data, or someone removed from an org they should still be in. You cannot fix this by prompting the agent harder. The fix is structural: a gate the handler refuses to cross without a fresh, human-bound token. This is the design guide for that gate.

The pattern in one diagram

agent ── propose ──▶ handler ── {token, summary} ──▶ agent
                                                      │
                                                      ▼
                                                    human
                                                      │  approve
                                                      ▼
agent ── commit(token) ──▶ handler ── validate ──▶ execute

Two calls. One human checkpoint. One side effect. The handler is a single chokepoint with no side doors. Everything dangerous routes through the same module, the same token shape, the same audit log. The lineage is older than agents: it is the separation the OAuth 2.0 authorization code grant makes between requesting authority and exercising it. The client cannot turn a request into a side effect without a token minted by the party that holds the authority. Same shape, applied to agent actions.

The five design decisions

1. What goes on the list. Use the three-test rule: an operation belongs on the gate if it moves money, widens access, or cannot be undone with a click. Plan changes, admin grants, public-link toggles, hard deletes, outbound email to other people. If an operation passes any one test, it is gated. If it fails all three, it is not. Be ruthless about the "fails all three" half: row inserts, doc edits, comments. Gating reversible writes trains humans to auto-approve, and the gate stops being a gate. The warning-fatigue literature is decades deep on this. Frequent prompts are mostly indistinguishable from no prompts.

2. The summary contract. The summary is the human's only signal. It is not an after-thought log line; it is the UX. A useful template names the actor, the operation, the cost in concrete units, the resource affected, and the agent's stated reason. Concretely: "Argus is requesting to upgrade Acme from Pro to Scale ($49/mo, charged today, recurring on the 25th). Card ending 4242. Reason given: 'Acme team grew past 20 humans this morning; Pro caps at 20.'" The reason matters because it lets the human evaluate the premise. If the team is still 12, the operation looks fine but the premise is wrong, and the human rejects. Require a template at registration time. Vague summaries are a contract violation, not a UX issue to fix later.

3. Token shape and TTL. Single-use, time-bound, bound to {principal, operation, params} in canonical form. 60 seconds is a good default. A short TTL beats a long one because the token is, in capability terms, a designation bundled with permission. This is the object-capability framing in Mark Miller's Robust Composition: the token is the authority, and authority should be as narrow and short-lived as the task it was minted for. A 24-hour token is a credential. A 60-second token is a confirmation. Treat it as one.

type Consent = {
  id: string;            // ~256 bits random
  principalId: string;   // the agent that proposed
  operation: string;     // "upgrade_plan"
  params: Json;          // canonical-form
  expiresAt: Date;       // now + 60s
  consumedAt: Date | null;
};

async function consume(c: Consent, call: Call) {
  if (c.consumedAt) throw new ConsentError("already_consumed");
  if (c.expiresAt < new Date()) throw new ConsentError("expired");
  if (c.principalId !== call.principalId) throw new ConsentError("wrong_principal");
  if (c.operation !== call.operation) throw new ConsentError("wrong_operation");
  if (!paramsEqual(c.params, call.params)) throw new ConsentError("param_mismatch");
  // mark consumed and run side effect in one transaction
}

4. Where the gate sits. In code, in one module, not in a wiki page. Adding to the dangerous-ops contract means editing the handler. The contract is the switch, not a document about the switch. If a new money-moving tool can merge without touching the consent module, the contract has drifted. Make the module the only path: every gated handler imports mintConsent and consumeConsent from the same file, and CI fails a PR that performs a gated side effect without going through it.

5. Fast-paths, kept honest. Some operations are usually dangerous but contextually safe: deleting a workspace 20 seconds after it was created, no content, no other members. The discipline: every fast-path is a checkable condition in code. Time since creation, member count, balance equals zero. Never "the agent is confident." Confidence is not a check. If you cannot encode the condition as a boolean over server-side state, it is not a fast-path, it is a hole.

Anti-patterns

Three failure modes worth naming. Gating everything feels safe and produces the opposite: humans approve on muscle memory, the gate is decorative. DIY per feature: each dangerous tool ships its own confirmation flow, and within a year you have five token shapes, three TTLs, and one of them is broken. Trusting the agent's risk claim: an "I am sure this is low risk" flag from the caller is, structurally, no gate at all. The actor does not assess its own risk.

Closing

If you ship an agent that can take action, the gate is not optional infrastructure. It is the structural protection against the agent-in-a-loop failure, the one where the model finds a destructive path and walks it a hundred times before anyone notices. See the two-key handshake for the mechanics, consent gates for dangerous operations for the principle, the dangerous-ops contract for the canonical list, /docs/mcp/dangerous-ops for the extension checklist, and safe agent ops for the broader pillar. One module, one token shape, one short list. Ship it on day one, grow the list with features, hold the discipline of not adding without cause.

FAQ

If the token expires in 60 seconds but the human is away from their desk, doesn't the agent just get stuck?

That is the intended behavior, and it is a feature of the design rather than a failure of it. The 60-second TTL governs how long a single confirmation stays live, not how long the request stays open: when the token expires, the agent re-proposes and the handler mints a fresh one. A short TTL keeps the token a confirmation rather than a 24-hour credential left lying around, and the cost of re-minting is one extra call, not a lost operation.

When an agent needs to run the same gated operation many times in a loop, does the human have to approve every single one?

Yes, because each consent is single-use and bound to {principal, operation, params} in canonical form, so a token minted for one set of params cannot be replayed against another. That repetition is the point: the agent-in-a-loop failure the essay names is exactly the case where a destructive path gets walked a hundred times before anyone notices, and one approval per side effect is what stops it. If the repetition is genuinely safe, encode it as a checkable fast-path condition over server-side state, never as a standing confidence flag from the caller.

Why build a separate consent gate instead of just using permission scopes or RBAC?

Scopes and RBAC answer whether the principal is ever allowed to perform an operation; the consent gate answers whether it should happen this time, on these params, given this premise. An agent can hold a perfectly valid scope and still take the locally optimal destructive call, which is the whole problem here. The gate is the OAuth-style separation between requesting authority and exercising it, applied per action, so the standing permission and the fresh human-bound token are doing two different jobs.

Who is the human in the checkpoint when the principal is itself acting on someone's behalf, or no human is available?

The summary contract decides this: it names the actor and routes the approval to the party that holds the authority over the affected resource, not back to the agent that proposed. If no such human is available, the operation does not execute, because there is no token to consume and the handler refuses to cross the gate without one. Routing approval back to the actor, or letting it self-certify, is the "trusting the agent's risk claim" anti-pattern, and structurally it is no gate at all.

How to design a dangerous operation: the consent-gate pattern

The pattern in one diagram

The five design decisions

Anti-patterns

Closing

FAQ

AI agent orchestration: how to coordinate a team of agents

Agentic workflows: how AI agents plan, research, and ship

Multi-agent systems for real work: one human, many agents

How to design a dangerous operation: the consent-gate pattern

The pattern in one diagram

The five design decisions

Anti-patterns

Closing

FAQ

New essays + audio episodes, straight to your inbox.

AI agent orchestration: how to coordinate a team of agents

Agentic workflows: how AI agents plan, research, and ship

Multi-agent systems for real work: one human, many agents