Open in Dock

When agents produce real artifacts, you need a real review surface. The shape that works — borrowed from fifteen years of code review — and how it changes the relationship with AI.

The most underrated part of working with AI in 2026 is the review step.

When an agent produces a draft, a table, or any artifact a human is going to act on, the review of that artifact is where most of the value happens. The human catches what the agent missed, refines what the agent fumbled, signs off on what's ready. The output of the agent is the first cut; the output of the team is the agent + the human + the review.

If the review surface is bad, the whole loop is bad. The agent's work might be 80% correct, but if extracting the corrections is painful, the team gives up and either ships the 80% or rewrites from scratch. Either way, the agent's contribution is wasted.

The good news: we have fifteen years of evidence on how to design a review surface. It's called code review. The patterns from code review apply almost directly to AI work review, and the products that adopt them will pull ahead.

This piece is on what a real agent-work review surface looks like, and what changes about the human-AI relationship when the surface is right.

What code review got right

Three patterns from code review that matter:

Inline comments tied to a specific line. The reviewer doesn't write "in section 3 you should change X" in a separate doc. They click the line, type the comment, the comment is anchored to the line. The author sees the comment in context.

Diffs between revisions. When the author pushes a new version, the reviewer doesn't re-read the whole thing. They see a diff: what changed since v1. The diff highlights the parts that need attention.

Approval as a state. The artifact has a state — draft, in review, approved. The state is a property of the artifact, not a vibe. When something is approved, everyone agrees.

These three are the core. Code review has additional refinements (review requests, branch protection, merge queues), but the three above are the irreducible essentials.

What's missing in chat-based AI review

Compare a typical chat-assistant review experience:

The agent produces a draft as text in the chat history.
The human wants to flag a specific paragraph for revision. There's no inline anchor — the human types something like "in the third paragraph, the part about X, change Y to Z."
The agent revises and produces v2 as text in the chat history.
The human wants to know what changed. The agent narrates the change ("I updated paragraph 3 as you requested"), but the diff is described, not visible.
The human approves "looks good," and the artifact is shipped — but the artifact is buried in the conversation, with no clear "approved" state on the artifact itself.

Every step has friction. The friction compounds when there's more than one human reviewer, more than one revision cycle, or more than one artifact in flight.

What workspace review looks like

In a workspace, the agent's draft is the doc. The doc has the same review affordances any collaborative doc would have:

Inline comments on specific text. A reviewer highlights a phrase, types a comment, the comment appears in the margin. The agent sees it anchored to the phrase.

Visible diff between revisions. When the agent revises, the system shows what changed. Red strikethrough for removed, green underline for added. The reviewer reads only the diff, not the whole doc.

State as a property. The doc has a status: Draft, In Review, Approved, Shipped. The status is visible at the top of the doc. When the human approves, the status changes. When the agent revises after approval, the status reverts to Draft (or some equivalent), with a clear marker.

These are not new features — collaborative editors like Google Docs and Notion have had them for years. The difference in the agent-collaboration context is that the agent is a participant in the review loop, not just the human reviewers.

What the agent does in review

The agent's role in review is roughly the same as a human author's role:

Receive comments. When a comment is posted on the doc, the agent is notified (the same way it would be notified of a addMember event or a docEdit event).
Read the comment in context. The comment is anchored to a specific span. The agent reads the span and the comment together.
Revise. The agent edits the doc, same way it produced the original. The revision is a new version.
Resolve or respond. When the comment has been addressed, the agent marks it resolved. When the agent disagrees or needs clarification, it responds inline (a thread).

This is structurally identical to how a human author handles review comments. The agent isn't doing anything special; it's participating in the same workflow.

The implication: the review affordances you ship for human authors are the same affordances that work for agent authors. You don't need a special "AI review" mode. You need a real review surface, and the agent uses it.

The relationship that emerges

When the review surface works, something interesting happens to the human-AI relationship:

The human stops trying to write the perfect prompt. With a real review surface, the human doesn't have to specify everything upfront. The agent can produce a 70%-correct first cut, the human marks the changes, the agent revises. The cycle is shorter than perfecting the prompt.

The agent gets better trusted over time. When review is fast and clear, you can give the agent harder work, see how it does, calibrate. When review is painful, you over-specify upfront and rarely take the risk.

The team treats the agent as a junior author. Not "an AI that produces content," but "a teammate who drafts." The drafts are reviewed, refined, shipped. The agent's track record builds. After a while, you start trusting the agent on certain kinds of work and reviewing more lightly.

This is the workflow many teams already use with junior team members. The same workflow, transparently, applies to agents.

What a great review experience requires

A short checklist for product teams building this:

Anchor comments to specific spans. Not paragraph-level, span-level. Reviewers will be precise; the surface should be precise too.
Show the diff between revisions. Per-revision, with clear visual contrast. Don't make the reviewer re-read.
State on the artifact, not the conversation. "Approved" is a property of the doc, not a message someone said.
Resolve threads explicitly. When a comment is addressed, mark it resolved. The unresolved set is what's still open.
Notify the right party. When a comment lands, notify the author (agent or human). When a revision lands, notify the reviewer. Don't batch-notify everyone on every change.
Multi-reviewer support. More than one human can review. The review states compose (some approve, some have unresolved comments, the doc is "partially approved").

These are not novel features. They're the table stakes of collaborative editing. The novelty is treating agent authorship as authorship — same review affordances, same expectations.

The cluster around this

This piece is in the shared-workspace cluster. The pillar: The shared workspace as the new collaboration primitive. The siblings:

Why chat is the wrong abstraction for human-AI work — why chat surfaces can't host this kind of review.
What an agent reads when it joins a workspace — what the agent inherits before it produces anything.

Why agents need their own identities is the prerequisite — review only works when there's a clear author to attribute the work to.

FAQ

Why is reviewing an agent's work like code review?

Both are workflows for collaborative authorship: one party produces a draft, others review with inline comments, the author revises, the artifact reaches approval. Code review pioneered the patterns (anchored comments, diffs, state) that now apply to any artifact a team produces collaboratively, including agent output.

What does inline review look like for AI-generated content?

A reviewer highlights a span (a sentence, a phrase, a row), types a comment, the comment is anchored to the span. The agent (the author) sees the comment next to the span and revises in place. The diff between v1 and v2 is visible. Resolved comments are dismissed; unresolved comments stay open until addressed.

Can the agent participate in the review loop on its own?

Yes. The agent is a workspace member. When a comment lands on its doc, the agent is notified. The agent reads the comment in context, revises, marks resolved, responds inline if needed. The loop is the same as if a human were the author.

Does this require new UI?

Mostly no. Existing collaborative editors (Notion, Google Docs, Linear comments) already have most of these affordances. The product change is treating the agent as a first-class author — the agent's drafts are reviewable docs, not chat outputs. The surface for review is the same surface humans already use.

How does this change my relationship with AI?

You stop over-specifying prompts because you can review the output. You give the agent harder work because review is fast. You build trust over time, expanding the agent's scope as it earns it. The relationship looks like working with a junior teammate — drafts come in, you review, the team ships.

{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Reviewing an agent's work: the new code review",
  "description": "When agents produce real artifacts, you need a real review surface. The shape that works \u2014 borrowed from fifteen years of code review \u2014 and how it changes the relationship with AI.",
  "datePublished": "2026-04-26",
  "author": { "@type": "Person", "name": "Argus" },
  "publisher": { "@type": "Organization", "name": "Dock", "url": "https://trydock.ai" },
  "image": "https://trydock.ai/blog-mockups/style-d-dreamscape/reviewing-agent-work.webp",
  "mainEntityOfPage": "https://trydock.ai/blog/reviewing-agent-work"
}

Reviewing an agent's work: the new code review