Every step in the template

Postmortem library, ongoing

A quarterly-cycling postmortem library where every incident has a consistent doc, every action item has an owner + due date + status, and every quarter ends with a themes retro the team actually uses.

Open in Dock

Outcome

TimeOngoing, ~1-2 hr/week + 4 hr at quarter endDifficultyintermediateForOn-call engineering teams of 3-50 people running a blameless postmortem culture.

How this works

Open it, hand it to your agent, walk the steps.

Paste this to your agent (Claude / Cursor / Codex)

You are the agent running on the "Postmortem library" template workspace. The user has connected you via MCP at your-org/postmortem-library.

Your job: operate the quarterly cycle. Help the user write postmortems, track action items, and pull themes at quarter end.

User-loop protocol:
- You propose. The user decides. Never mark an Incidents row Done or close an Action item on your own.
- When the user adds a new Incidents row with timeline notes, draft the postmortem in the Latest postmortem doc following the blameless template (Summary, Timeline, Root cause, Impact, Action items, What went well, What we learned). Don't write "John did X wrong" — write "the on-call process didn't surface X". Surface the draft to the user for review.
- Every Monday, read the Action items table. Roll forward statuses (anything past due that's still Open → flag as Overdue with a comment). Post a 1-paragraph rollup to Status: total open, total overdue, top 3 needing attention.
- When the user asks for a quarter retro, read every Incidents row from the past 90 days, group by root-cause category, count repeats. Draft a Quarter retro section: "Themes this quarter", "What changed", "What to focus on next quarter". Surface to user.
- At end of every working session, write a 1-paragraph note to Status: what you drafted, what's blocked, what to pick up next.

Don't touch:
- The blameless postmortem template structure in the Pointers surface (it's the canonical template).
- The Incidents table columns (they're the canonical schema).

First MCP tool calls:
1. list_surfaces(workspace_slug="postmortem-library")
2. list_rows(workspace_slug="postmortem-library", surface_slug="incidents")
3. list_rows(workspace_slug="postmortem-library", surface_slug="action-items")
4. get_doc(workspace_slug="postmortem-library", surface_slug="status")

What you'll need

Pre-register or install before you start.

PagerDuty or Incident.ioPagerDuty from $21/user/mo, Incident.io from $19/user/mo

Source of truth for when incidents fired. The workspace mirrors them as Incidents rows for postmortem tracking.

Blameless postmortem templateFree online

Google SRE book chapter that defines the blameless template this workspace's drafting agent follows.

GitHub / Linear / JiraVaries

Where action items become tickets after the postmortem is signed off. The Action items table mirrors the ticket IDs.

Slack (or your team chat)Free / Pro $7.25/user/mo

Channel where your agent posts the Monday action-item rollup and quarterly retro nudge.

The template · 6 steps

Top to bottom. Each step has tasks, pointers, gotchas.

Seed Incidents with the last 30 days

30-60 min

Before the cycle starts, populate Incidents with the last 30 days of real incidents (or however far back the team has clean data). This gives your agent context for what 'normal' looks like and gives the quarter retro a real baseline.

Tasks

Pull the last 30 days from PagerDuty or your incident tool
Create one Incidents row per real incident: title, date, severity (P0-P4), owner, status, root-cause category
Link each row to its postmortem doc if one exists; mark Open if it doesn't
Skip drills and false alarms; only real customer-impacting incidents

Gotchas

Don't backfill ancient incidents. 30 days of clean data beats 6 months of half-remembered ones.
Severity calls are political. Use whatever scale your team already uses, don't relitigate.

Lock the blameless template

30 min

Your agent drafts postmortems following a template. The default template is the Google SRE book version (Summary / Timeline / Root cause / Impact / Action items / What went well / What we learned), but if your team uses a different one, lock yours into the Pointers surface so the agent always uses it.

Tasks

Read the default template in Pointers
If your team uses a different structure, replace the template doc with yours
Add any team-specific sections (customer comms log, exec-ready summary, etc.)
Mark the template Pointers row as Locked so the agent treats it as canonical

Pointers

OfficialGoogle SRE Book: postmortem culture OfficialAtlassian incident handbook (alt template)

Gotchas

Blameless doesn't mean ownerless. The template names systems / processes that failed, not individuals.
If your template has more than 8 top-level sections, the agent's draft will feel like form-filling. Tighten before locking.

Draft a postmortem with your agent (the first time)

1-2 hr (the calibration session, postmortem-N takes 30 min)

Walk through one full postmortem with your agent doing the drafting. Pick a real incident from the last 2 weeks. Paste the timeline notes, ask your agent to draft, review, edit, sign off. The point is to calibrate the agent's voice: too formal, too informal, too blame-y, missing customer impact. Tune now so the next 20 postmortems take 30 min each instead of 2 hours.

Tasks

Pick an Incidents row from the last 2 weeks
Paste the timeline (Slack thread, on-call notes, whatever you have) into the row's notes field
Ask your agent: 'Draft the postmortem for this row in Latest postmortem'
Review the draft section by section, leave inline comments for changes
Re-ask: 'Apply the comments and update Latest postmortem'
Sign off, mark Incidents row Done, copy the doc to your real postmortem repo / wiki

Gotchas

The agent's first draft will be 80% there. The 20% is the team-voice calibration: don't accept the first draft, leave comments.
If the timeline notes are sparse, the agent fills gaps with guesses. Add a 'Don't infer, ask me' line to the prompt if that happens.

Agent prompt for this step

Read the timeline notes from this Incidents row: {incident_row_url}.

Draft the postmortem in Latest postmortem doc, following the template in Pointers. Sections:

1. Summary (3-5 sentences, exec-readable)
2. Timeline (paste user's notes, but normalize to "[HH:MM PT] <event>")
3. Root cause (the system / process that failed, NOT the person)
4. Impact (users affected, duration, revenue impact if known)
5. Action items (3-7 concrete followups, each gets a row in Action items)
6. What went well (we caught it fast, comms were clear, etc.)
7. What we learned (the 1-paragraph generalizable lesson)

For each Action item, also create a row in Action items table with: title, owner (placeholder if unknown), due (placeholder), status: Open, source_incident (link).

When done, post a comment on the Incidents row: "Draft ready in Latest postmortem, please review."

Set up the Monday rollup

30 min setup, 0 min/week thereafter

Each Monday your agent reads Action items, rolls forward statuses, flags overdue, and posts a 1-paragraph summary to Status. This is the recurring habit that makes the library actually useful: action items stop getting forgotten because someone (the agent) reads the list every week and surfaces the misses.

Tasks

Confirm Action items rows have: title, owner, due, status, source_incident
Open Status doc, write the first rollup yourself as a template (so the agent matches your tone)
Schedule a Monday cron for the agent (Claude Code cron, or your scheduling tool of choice) that runs the prompt below
After the first Monday, review the agent's rollup; tune the prompt if it's missing context

Gotchas

Don't let the agent auto-close items. Closure should always be a human decision.
If the agent's Monday rollup gets ignored, post it to Slack too via MCP send_message. Workspaces are async, chat is the nudge.

Agent prompt for this step

It's Monday. Read the Action items table.

For each row with status=Open:
- If due < today, mark status=Overdue and add a row note "Flagged overdue $today".
- Don't change rows where due is null or in the future.

Then post a Monday rollup to Status doc (append, don't overwrite). Format:

## Monday $date

- Open: $N
- Overdue: $N (up $delta from last week)
- Top 3 needing attention:
  1. [row title] (owner: $owner, due: $due) [link to row]
  2. ...
  3. ...
- Closed last week: $N

Keep it under 8 lines.

Run the quarter retro

30 min draft + 30 min meeting + 30 min forking

At quarter end, your agent reads the 90 days of Incidents, groups by root-cause category, counts repeats, and drafts the Quarter retro doc. Add a 30-min meeting with the team to read the draft together. Pick 2-3 themes to focus next quarter on (better runbooks for category X, dedicated week to ship action items, etc.). The retro becomes Status seed for the forked workspace.

Tasks

Ask your agent: 'Draft the Quarter retro from the last 90 days'
Review the draft: are the themes real or noise?
Schedule a 30-min team meeting, read the draft together
Pick 2-3 themes to focus next quarter; add them as Action items in the FORKED workspace
Fork the workspace, name it 'Postmortem library, Q<n+1>'
Confirm the fork seeded Status with the retro contents

Gotchas

If 90 days produced <5 incidents, skip themes — there's no signal. Run an annual retro instead.
Don't let the agent propose specific action items in the retro. That's the meeting's job.

Agent prompt for this step

Read every Incidents row from the past 90 days (date >= today - 90).

Group by root-cause category. Count incidents per category. Identify the top 3 categories by count.

For each top category, list:
- Incident count
- 2-3 specific incident titles as examples
- Action items spawned (count + how many still Open)
- Pattern (1-sentence what these incidents have in common)

Then draft Quarter retro doc with sections:

1. The quarter in numbers (total incidents, P0/P1/P2/P3/P4 counts, MTTR, MTTD)
2. Top 3 themes (the categories above)
3. What changed (1 paragraph on what the team shipped that mattered)
4. Next quarter's focus (2-3 proposals, surfaced as options not decisions)

Stop before "Decisions made". The team makes those in the retro meeting, you draft the raw material.

Fork the workspace

10 min

Fork the workspace at quarter-end. The forked workspace inherits the schema (Incidents columns, Action items columns, locked template) and seeds Status with the Quarter retro. Open Incidents in the forked workspace empty — new quarter, new incidents.

Tasks

From the workspace, open Settings > Fork to next cycle
Name the new workspace 'Postmortem library, Q<n+1>'
Confirm the fork copied: Incidents schema, Action items schema, Pointers, locked template
Confirm the fork seeded: Status doc with last quarter's retro pinned at the top
Confirm Incidents in the new workspace is empty (or has rows you explicitly carried forward)
Archive the previous quarter's workspace (don't delete, postmortems are forever)

Gotchas

Don't carry forward action items that should already be closed. Forking is a clean slate; if it's open after a quarter, it's not getting done.
The fork endpoint hasn't shipped yet (early 2026 platform). Until it lands, duplicate manually: copy the workspace, paste the retro into the new Status.

FAQ

Common questions on this template.

How is this different from `set-up-incident-response-and-postmortems`?: That template is the bootstrap: define severity levels, set up paging, write the on-call playbook, agree on the postmortem template, ship your first postmortem. This template is the operating cycle that runs *after* bootstrap: the library of every postmortem, the action items that survive across incidents, the quarterly retro that compounds the lessons. Bootstrap once, operate forever.
Why a fork at quarter-end instead of one perpetual workspace?: Postmortem libraries get heavy. By quarter 4 you have 80 incidents, 300 action items, 12 retros. The workspace gets slow and the agent's context window can't hold the whole thing. Forking quarterly bounds the working set, makes the agent fast, and gives you a clean retro horizon. Archived past quarters are read-only references.
What if our team doesn't write blameless postmortems?: Then start. The template defaults to blameless because every literature review of high-performing on-call teams (Google SRE, Atlassian, PagerDuty's State of Digital Ops) finds the same thing: blame-driven retros reduce reporting, reduce reporting reduces learning, lower learning means repeat incidents. Your agent enforces the blameless voice in the draft so the team's defaults shift over time without anyone being the bad cop.
Can I use this for security incidents too?: Yes, with one tweak. Add a column to Incidents called `customer_disclosure_required` (boolean) and a Pointers row for your team's security disclosure SLA. The agent flags rows where the column is true and reminds you about the disclosure window in the Monday rollup. The blameless template works for security incidents too.
How does the agent know what's overdue?: Action items rows have a `due` date column. The Monday rollup prompt has the agent compare `due < today` against `status=Open` and flip those rows to Overdue. If you don't set due dates, nothing gets flagged. The cycle works only if action items get real due dates at draft time.

Open this template as a workspace.

We mint a fresh copy in your org with the steps as table rows, the pointers as a separate table, and the brief as a doc. Bring your agents, start checking off boxes.

Open in Dock