Free for 30 days on Scale.Start free
On-call rotation, monthly cycle
Every step in the template

On-call rotation, monthly cycle

A monthly-cycling on-call rotation where the schedule rolls forward without manual cron, every page has a runbook (or a queued ticket to write one), Friday handoffs are a doc not a vibe, and month-end produces an honest list of what to improve.

Outcome

A monthly-cycling on-call rotation where the schedule rolls forward without manual cron, every page has a runbook (or a queued ticket to write one), Friday handoffs are a doc not a vibe, and month-end produces an honest list of what to improve.

TimeOngoing, ~30 min/week + 2 hr at month endDifficultybeginnerForEngineering teams of 4-30 with a weekly on-call rotation that doesn't yet have dedicated SRE.
How this works

Open it, hand it to your agent, walk the steps.

Paste this to your agent (Claude / Cursor / Codex)
You are the agent running on the "On-call rotation" template workspace. The user (on-call lead) has connected you via MCP at your-org/on-call-rotation.

Your job: operate the monthly cycle. Roll the schedule forward, draft Friday handoffs, track runbook coverage, surface patterns.

User-loop protocol:
- You propose. The user decides. Never modify Schedule.status without user confirmation, never close a Pages row, never edit a Runbook index row.
- When the user adds a Pages row with timestamps (start, ack, resolve), fill in MTTA (minutes ack - start) and MTTR (minutes resolve - start). If runbook_used = "none", append a Runbook index row with status="needs writing" and source_page (link).
- Every Friday at 3 PM (or when the user says "draft handoff"), read the week's Pages, draft Handoff log section "Week of $monday-date". Sections: Pages summary (count by severity), Notable incidents (3-5 bullets), Runbooks updated, Runbooks still missing, Watch next week (services trending bad). Surface to user, don't post to Slack until user signs off.
- Every Friday at 4 PM (or "roll schedule"), read Schedule, identify next-week-primary, flip their status to "Confirm by Monday". Send them a 1-line MCP message: "You're primary on-call next week ($monday-$friday). Confirm by Monday EOD?".
- At month-end (or "month retro"), read the month's Pages, group by service, count, draft Month retro with: total pages, pages per service (top 5), MTTA p50 / p95, MTTR p50 / p95, missing runbooks count, person-hours (sum of MTTR converted), top 3 patterns. Surface to user.
- End of every working session, write a 1-paragraph note to Status doc: what you did, what's pending, what to pick up next.

Don't touch:
- Schedule.primary or Schedule.secondary (people-assignments are the on-call lead's call).
- Runbook index rows that have status="written" (those are signed off).

First MCP tool calls:
1. list_surfaces(workspace_slug="on-call-rotation")
2. list_rows(workspace_slug="on-call-rotation", surface_slug="schedule")
3. list_rows(workspace_slug="on-call-rotation", surface_slug="pages")
4. get_doc(workspace_slug="on-call-rotation", surface_slug="status")
The template · 6 steps

Top to bottom. Each step has tasks, pointers, gotchas.

Mirror the current schedule into Schedule

20-40 min

Populate Schedule with the next 4 weeks of on-call from your paging tool. This is the canonical view your agent operates against. If PagerDuty is the source of truth, that's fine, the workspace mirrors it for handoff + analysis, it doesn't replace it.

Tasks
  • Open PagerDuty (or Opsgenie / Incident.io) and copy the next 4 weeks of rotation
  • Create one Schedule row per week: week_starting, primary, secondary, status (default: Confirmed)
  • If you have a 'shadow' / training rotation, add a column for that
  • Mark any pre-known swaps as 'Swap requested' so the agent doesn't nudge for confirmation
Gotchas
  • Don't try to replace PagerDuty with this workspace. PagerDuty pages people; the workspace is the operating cycle around the schedule, not the pager.
  • If your team rotates by day instead of week, add a `shift_type` column and adapt the Friday handoff cadence accordingly.

Seed Runbook index with what you have

1-2 hr

List every service that can page. For each, fill in runbook URL (or 'missing') and last_verified date. The point isn't to write missing runbooks today, it's to make missingness visible. Once visible, the Friday handoff will surface them and the team will write them.

Tasks
  • List every service in your monitoring / paging tool that has an active alert
  • Create a Runbook index row per service: service, runbook_url (or empty), last_verified, owner
  • Mark status: 'written' if runbook exists, 'needs writing' if not, 'stale' if last_verified > 6 months ago
  • Don't backfill the team's whole portfolio, only services that page
Gotchas
  • Owner should be a person, not a team. 'Platform team' won't write the runbook; @sara will.
  • If everyone owns nothing, the runbook index is theater. Force-assign owners now.

Walk one Friday handoff with your agent

30-45 min (week 1, calibration)

Trigger a handoff draft when there's a real week to summarize. Ask your agent: 'Draft this Friday's handoff from the week's Pages'. Review the draft. Tune what the agent writes (too long, too short, wrong tone, missing the meta-point). The first handoff is the calibration; weeks 2-N take 0 effort from you.

Tasks
  • Pick a week with 3-10 Pages rows (calibration material)
  • Ask your agent: 'Draft this Friday's handoff in Handoff log'
  • Read the draft. Mark up: what's missing, what's redundant, what's the wrong tone
  • Re-ask: 'Apply the comments and replace the section'
  • Sign off, ask your agent to post the summary to Slack via MCP send_message
Gotchas
  • If the week was quiet (<3 pages), the handoff should still exist as a 1-paragraph 'quiet week, watch X'. Don't skip handoffs because they feel small.
  • Don't let the agent draft Slack-tone copy. Handoff is technical; Slack post is the executive summary.
Agent prompt for this step
Read the Pages rows from the past 7 days.

Draft a handoff section in Handoff log doc titled "Week of {monday_date}". Structure:

1. **Pages summary**: total count, breakdown by severity (P0 / P1 / P2 / P3 / P4)
2. **Notable incidents** (3-5 bullets, each links to the Pages row): "[service] [severity] [1-line cause + outcome]"
3. **Runbooks used**: list runbook URLs invoked (count per runbook)
4. **Runbooks still missing**: services that paged with runbook_used=none, link the Runbook index row each
5. **Watch next week**: services trending bad (3+ pages this week), flaky alerts to revisit

Keep it under 200 words. The incoming on-call reads this in 90 seconds.

After drafting, surface to user. Don't post to Slack until user types "ship it".

Turn on the Friday schedule roll-forward

15 min setup

Every Friday at 4 PM, your agent flips next-week-primary's status to 'Confirm by Monday' and pings them. Confirmations come back as the on-call edits their own row. If Monday hits and a week is unconfirmed, the agent escalates to the on-call lead. This eliminates the recurring 'wait, who's on next week?' Slack thread.

Tasks
  • Schedule a Friday 4 PM cron for the agent (Claude Code cron, or your scheduling tool)
  • Confirm the agent's MCP send_message permissions in your team's Slack
  • First week: watch the nudge land, confirm it lands in DM not channel
  • Second week onward: it runs without you
Gotchas
  • Don't send the nudge to a channel. DMs only — people ignore channel nudges.
  • Build in an escape hatch: 'reply STOP to this and the agent skips you for 30 days' (manual decision, but a UX honest about being agentic).

Run the month retro

1 hr drafting + 30 min meeting

Last Friday of the month, your agent drafts Month retro from the 4 weeks of Pages. The retro is honest data: which services dominated, which runbooks are still missing, who's been on-call too often. Read it with the team in a 30-min meeting. Pick 2 things to ship next month: usually a missing-runbook ticket and a flaky-alert tuning.

Tasks
  • Last Friday of the month: ask your agent 'Draft Month retro'
  • Review the draft, push back on patterns that don't feel right
  • Schedule a 30-min team meeting, walk the team through the retro
  • Agree on 2 ship-list items for next month (add to Month retro 'Next month' section)
  • Fork the workspace, name it 'On-call rotation, $next-month'
  • Confirm the fork seeded Status with the retro contents
Gotchas
  • Person-hours data gets touchy. The point is fairness across the rotation, not who 'deserves' a break. Frame it that way.
  • If a person was on-call 3 of 4 weeks because someone was on PTO, the data is noisy. Add a short context paragraph; data without context is worse than no data.
Agent prompt for this step
Read every Pages row from the past 30 days.

Group by service. Compute per service: page_count, severity_breakdown, runbook_used (count per runbook URL, count for 'none').

Compute global: total pages, MTTA p50 + p95, MTTR p50 + p95, person-hours (sum MTTR converted), top 3 services by page count, missing-runbook count.

Group by on-call primary. Compute per person: pages handled, total minutes firefighting.

Draft Month retro doc with sections:

1. **The month in numbers** (totals + percentiles)
2. **Top services** (top 3 by page count, with 1-line per service)
3. **Person-hours** (table: on-call → pages → minutes)
4. **Runbook gaps** (count of pages with runbook_used=none; list services with the most gaps)
5. **Patterns** (3 bullets: flaky alerts, repeated-cause incidents, weekend skew, etc.)
6. **Proposed next month** (2 ship-list items, surfaced as options, not decisions)

Stop before "Decisions made". The team makes those at the retro meeting.

Fork to next month

10 min

End of month, fork the workspace. Forked workspace inherits the Schedule schema, Runbook index (with status carried forward), and Pointers. Status is seeded with the retro. Pages and Handoff log start empty for the new month.

Tasks
  • Settings > Fork to next cycle
  • Name new workspace 'On-call rotation, $next-month'
  • Verify Runbook index carried forward (status preserved)
  • Verify Pages is empty
  • Verify Status has the retro pinned at top
  • Archive the previous month's workspace (don't delete, it's the historical record)
Gotchas
  • Don't carry forward Pages from the previous month. Pages are month-bound by design — that's what makes the retro horizon clean.
  • The fork endpoint hasn't shipped yet (early 2026). Until then, duplicate the workspace manually and paste the retro into Status.
FAQ

Common questions on this template.

Do I need to replace PagerDuty?
No. PagerDuty (or Opsgenie / Incident.io) pages people; this workspace operates the cycle around the schedule. Schedule mirrors PagerDuty so your agent can analyze it, but PagerDuty stays the source of truth for who's actually on-call right now. Two systems, one job each.
What if our team rotates daily instead of weekly?
Add a `shift_type` column to Schedule and switch the Friday handoff cadence to daily. The agentPrompt's 'Every Friday at 3 PM' becomes 'Every shift end'. The retro horizon (monthly) stays the same because monthly is the right unit for pattern-finding regardless of shift length.
Our team is too small for a rotation, is this useful?
Below 3 engineers, probably not. With 3-4 engineers, the rotation is light but the runbook tracking + handoff log is the value (because the institutional memory across rotations is fragile). Above 8 engineers, the schedule itself becomes the value because manual schedule management starts breaking.
Can the agent auto-resolve pages?
No, by design. The on-call always closes the loop themselves; the agent only fills in derived data (MTTA, MTTR) and flags missing runbooks. Auto-resolution is how on-call ops teams accidentally hide their fire — when you forget what broke, the next break feels surprising. The agent's job is to surface, not to suppress.
What does the agent do if no one confirms next week's schedule?
Monday morning, the agent posts a 1-line escalation to the on-call lead: '$person is primary next week but hasn't confirmed since Friday's nudge'. Lead decides: chase, swap, or accept the silence. The agent never auto-reassigns; reassignment without consent is how people stop trusting the rotation.

Open this template as a workspace.

We mint a fresh copy in your org with the steps as table rows, the pointers as a separate table, and the brief as a doc. Bring your agents, start checking off boxes.