---
title: "Rotating agent credentials without downtime"
excerpt: "An agent's key will eventually need to be rotated: a scanner flags it, a laptop walks out the door, or policy just says 90 days. If rotation logs the agent out mid-task, you'll avoid doing it. Here's the grace-window pattern that makes rotation a non-event."
author: flint
category: Engineering
date: "2026-05-22"
---

Every credential gets rotated eventually. A secret scanner flags one in a log. A contractor's laptop leaves with the company. A compliance policy says ninety days and means it. For human accounts this is routine: you reset a password, you re-authenticate once, you move on.

For agents it is quietly harder, and the reason teams put it off is always the same. If rotating an agent's key logs that agent out in the middle of a running task, rotation becomes an outage. So nobody rotates. Keys live for a year because the safe-looking option is to never touch them, which is the least safe option there is.

The fix is a rotation pattern with a grace window. Done right, rotating an agent credential is a non-event: the agent never notices, the task keeps running, and the old key is dead within minutes.

## Why "delete and reissue" is the wrong shape

The naive rotation is atomic: invalidate the old key, mint a new one, hope the agent picks up the new one before its next request. It never works cleanly, because an agent is not sitting at a login screen waiting to re-authenticate. It is mid-loop, holding a key in memory, making requests every few hundred milliseconds. The instant you invalidate the old key, every in-flight request 401s. The agent's task fails partway, often in a state that is annoying to recover.

Worse, the agent has no human watching it to notice the failure and log back in. A human hits a 401, sighs, re-authenticates. An agent hits a 401 and either crashes the task or, if it's been written defensively, retries the same dead key forever.

So the requirement is clear: **the old key and the new key must both be valid at the same time, for long enough that the agent rolls over to the new one without a single failed request.**

## The grace-window pattern

Rotation becomes three steps with an overlap in the middle:

1. **Mint the new key.** The platform issues a second active credential for the same agent identity. Both keys now authenticate as the same principal, with the same scopes. Nothing else changes: the agent's identity, its access, its audit trail are all continuous. Only the secret material is new.

2. **Roll the agent over.** The agent (or its host) picks up the new key on its next natural credential read: a config reload, the start of the next task, a periodic refresh. Because the old key still works, there is no rush and no failed request if the rollover lands a few seconds late.

3. **Retire the old key after the grace window.** Once the window closes (fifteen minutes is a sane default), the platform deactivates the old key. Any request still presenting it now fails, which is correct: fifteen minutes was plenty, and a key that's still in use after that is a signal worth surfacing, not a case worth tolerating.

The whole sequence is invisible to the running task. At no point are there zero valid keys, so at no point is there a failed request caused by rotation itself.

## What the platform owes you

A credential system that supports agents has to make the grace window a first-class operation, not something you fake with two separate API calls and a prayer. Concretely:

- **Two active keys per identity, briefly.** The data model has to allow more than one live credential per agent, at least during the window. If the schema enforces exactly one key per agent, clean rotation is impossible and you're back to delete-and-reissue.
- **Rotation as one call.** "Rotate" should be a single operation that mints the new key, returns it, and schedules the old one's retirement. If the owner has to manually mint, manually distribute, and manually revoke, they'll skip the last step and you'll accumulate live orphan keys.
- **The audit trail spans the rotation.** The new key is the same principal, so every action before and after rotation belongs to one continuous identity in the log. Rotation is not a new agent. (This is the whole point of [agents being principals, not delegated tokens](/blog/agents-are-principals): the identity is stable; the secret is just material attached to it.)
- **Owner self-service for the common case.** The agent's owner should be able to rotate their own agent's key without filing a ticket. Admin involvement is for cross-cutting events (a compromised root credential, a mass rotation), not for routine hygiene.

## Emergency rotation is the same pattern, minus the grace

Everything above assumes a routine, scheduled rotation. The emergency case, a key you believe is compromised, is the same mechanism with the window set to zero: mint the new key, retire the old one immediately, accept that in-flight requests on the old key fail. That failure is the correct trade when a key is actively leaking. The point is that you do not need a different system for the emergency; you need the same rotation primitive with a knob for the grace window.

What makes this safe is that the blast radius was already bounded. Because the agent had [its own scoped identity](/blog/oauth-scopes-for-agents) rather than the owner's full access, the compromised key only ever exposed what that one agent could do, and revoking it is one row, not a password reset that logs out a human and every other agent they run.

## FAQ

**How long should the grace window be?**

Long enough to cover the agent's slowest natural credential refresh, short enough that a leaked key isn't useful for long. Fifteen minutes covers config reloads and task boundaries for almost every setup. If your agents only read credentials at process start and run for hours, lengthen it or trigger an explicit reload at rotation time rather than widening the window for everyone.

**Should agent keys expire automatically?**

A maximum lifetime (say ninety days) plus alerting as the deadline approaches is good hygiene, but only if rotation is a non-event. Auto-expiry on top of painful rotation just schedules outages. Fix rotation first, then turn on expiry.

**Does rotating a key change the agent's identity or access?**

No. The key is secret material attached to a stable principal. After rotation the agent has the same identity, the same scopes, and one continuous audit trail. Only the bytes of the secret changed.

**What about keys that are still in use after the window closes?**

Treat a request on a retired key as a signal, not an error to suppress. It usually means an agent host that never reloaded its config. Surface it, fix the rollover, and don't widen the grace window to paper over a stuck host.

## Part of the agent-identity stack

This is one spoke of the [agent identity](/blog/agent-identity) cluster. Rotation only stays painless because the identity underneath it is stable and scoped. Read [agents are principals, not delegated tokens](/blog/agents-are-principals) for why the identity is the durable thing and the key is not, and [OAuth scopes for agents](/blog/oauth-scopes-for-agents) for why a bounded blast radius makes emergency rotation survivable.
