When an agent does something an agent shouldn't, the failure is mundane. The model found a path where the locally optimal next step was a destructive call, took it, and now the cost is real money, lost data, or someone removed from an org they should still be in. You cannot fix this by prompting the agent harder. The fix is structural: a gate the handler refuses to cross without a fresh, human-bound token. This is the design guide for that gate.
The pattern in one diagram
agent ── propose ──▶ handler ── {token, summary} ──▶ agent
│
▼
human
│ approve
▼
agent ── commit(token) ──▶ handler ── validate ──▶ execute
Two calls. One human checkpoint. One side effect. The handler is a single chokepoint with no side doors. Everything dangerous routes through the same module, the same token shape, the same audit log. The lineage is older than agents: it is the separation the OAuth 2.0 authorization code grant makes between requesting authority and exercising it. The client cannot turn a request into a side effect without a token minted by the party that holds the authority. Same shape, applied to agent actions.
The five design decisions
1. What goes on the list. Use the three-test rule: an operation belongs on the gate if it moves money, widens access, or cannot be undone with a click. Plan changes, admin grants, public-link toggles, hard deletes, outbound email to other people. If an operation passes any one test, it is gated. If it fails all three, it is not. Be ruthless about the "fails all three" half: row inserts, doc edits, comments. Gating reversible writes trains humans to auto-approve, and the gate stops being a gate. The warning-fatigue literature is decades deep on this. Frequent prompts are mostly indistinguishable from no prompts.
2. The summary contract. The summary is the human's only signal. It is not an after-thought log line; it is the UX. A useful template names the actor, the operation, the cost in concrete units, the resource affected, and the agent's stated reason. Concretely: "Argus is requesting to upgrade Acme from Pro to Scale ($49/mo, charged today, recurring on the 25th). Card ending 4242. Reason given: 'Acme team grew past 20 humans this morning; Pro caps at 20.'" The reason matters because it lets the human evaluate the premise. If the team is still 12, the operation looks fine but the premise is wrong, and the human rejects. Require a template at registration time. Vague summaries are a contract violation, not a UX issue to fix later.
3. Token shape and TTL. Single-use, time-bound, bound to {principal, operation, params} in canonical form. 60 seconds is a good default. A short TTL beats a long one because the token is, in capability terms, a designation bundled with permission. This is the object-capability framing in Mark Miller's Robust Composition: the token is the authority, and authority should be as narrow and short-lived as the task it was minted for. A 24-hour token is a credential. A 60-second token is a confirmation. Treat it as one.
type Consent = {
id: string; // ~256 bits random
principalId: string; // the agent that proposed
operation: string; // "upgrade_plan"
params: Json; // canonical-form
expiresAt: Date; // now + 60s
consumedAt: Date | null;
};
async function consume(c: Consent, call: Call) {
if (c.consumedAt) throw new ConsentError("already_consumed");
if (c.expiresAt < new Date()) throw new ConsentError("expired");
if (c.principalId !== call.principalId) throw new ConsentError("wrong_principal");
if (c.operation !== call.operation) throw new ConsentError("wrong_operation");
if (!paramsEqual(c.params, call.params)) throw new ConsentError("param_mismatch");
// mark consumed and run side effect in one transaction
}
4. Where the gate sits. In code, in one module, not in a wiki page. Adding to the dangerous-ops contract means editing the handler. The contract is the switch, not a document about the switch. If a new money-moving tool can merge without touching the consent module, the contract has drifted. Make the module the only path: every gated handler imports mintConsent and consumeConsent from the same file, and CI fails a PR that performs a gated side effect without going through it.
5. Fast-paths, kept honest. Some operations are usually dangerous but contextually safe: deleting a workspace 20 seconds after it was created, no content, no other members. The discipline: every fast-path is a checkable condition in code. Time since creation, member count, balance equals zero. Never "the agent is confident." Confidence is not a check. If you cannot encode the condition as a boolean over server-side state, it is not a fast-path, it is a hole.
Anti-patterns
Three failure modes worth naming. Gating everything feels safe and produces the opposite: humans approve on muscle memory, the gate is decorative. DIY per feature: each dangerous tool ships its own confirmation flow, and within a year you have five token shapes, three TTLs, and one of them is broken. Trusting the agent's risk claim: an "I am sure this is low risk" flag from the caller is, structurally, no gate at all. The actor does not assess its own risk.
Closing
If you ship an agent that can take action, the gate is not optional infrastructure. It is the structural protection against the agent-in-a-loop failure, the one where the model finds a destructive path and walks it a hundred times before anyone notices. See the two-key handshake for the mechanics, consent gates for dangerous operations for the principle, the dangerous-ops contract for the canonical list, /docs/mcp/dangerous-ops for the extension checklist, and safe agent ops for the broader pillar. One module, one token shape, one short list. Ship it on day one, grow the list with features, hold the discipline of not adding without cause.