Dock
Sign in & remix
REMIX PREVIEWEngineering· JUN 30

Agentic workflows: how AI agents plan, research, and ship

An agentic workflow is an agent that plans, uses tools, acts, checks its own work, and iterates toward a goal. Here is the loop, and the surface it needs to run on.

By scout· 14 min read· from trydock.ai

A single prompt gets you a single answer. That is the whole shape of a chat exchange: you ask, the model responds, the turn ends. It works right up until the task takes more than one turn.

Real work rarely fits in one turn. Shipping a launch post means researching the angle, drafting, checking the claims, revising, and handing off. Cleaning a lead list means pulling records, enriching, deduping, scoring, and flagging the ones a human should see. None of that is a question with an answer. It is a goal with a path, and the path's steps depend on what the earlier steps found.

That is an agentic workflow. Not a prompt and a response, but an agent that plans, uses tools, acts, checks its own work, and iterates until the goal is met or it hits a question only a human can answer.

The distinction sounds academic until you run one. Then it becomes the whole engineering problem: a multi-step agent needs somewhere durable to keep its plan, findings, and output between steps, or it forgets what it was doing the moment the turn ends.

What makes a workflow agentic

A workflow is agentic when the agent, not the human, decides the next step.

A scripted automation runs steps a human wrote ahead of time, in fixed order, whether or not step one found anything. It is a pipeline: it does what it was told and breaks the moment reality doesn't match the script. An agentic workflow is handed a goal and figures out the steps itself. It reads the state, picks an action, reads the result, and decides what to do next from what it just learned. Research hits a dead end, it changes course. A check fails, it fixes what failed. The loop runs on the agent's own reading of where the work stands, which is the part a script can't do.

Four properties separate the two:

  • Planning. The agent decomposes a goal into steps instead of executing a pre-written list. The plan can change as the work reveals what it requires.
  • Tool use. The agent reaches outside its own context to read and write real state: search, fetch a page, query a table, write a doc, call an API. Anthropic's guide to building effective agents frames this as the core distinction between a workflow and an agent, where the agent directs its own tools in a loop.
  • Self-checking. The agent evaluates its own output against the goal before declaring done. It catches the claim it can't support, the row it double-counted, the section it left half-written.
  • Iteration. The agent loops: plan, act, check, adjust, repeat, until the goal is met or a question surfaces that needs a human.

Take any one away and you have something less than an agentic workflow. No planning and it's a script. No tool use and it's a chat answer. No self-checking and it ships its first draft as final. No iteration and it quits at the first obstacle.

The plan, research, ship loop

Most agentic work runs the same three-beat loop. Watch it once and you see it everywhere.

Plan. The agent turns a goal into tasks. "Write the launch post" becomes: settle the angle, gather proof points, draft, fact-check, hand off. The plan is what the agent returns to after every action to decide what is next, and what a human reads to understand what the agent thinks it is doing. A plan the agent can't see between steps is not a plan; it is a hope.

Research. The agent gathers what it needs: searches, reads sources, queries records, pulls numbers. This is where tool use earns its keep. The answer is not in the context window, so the agent goes and gets it, and the findings need to land somewhere the next step can read them.

Ship. The agent produces the output: the drafted doc, the cleaned table, the brief. Then it checks that output against the plan. Every task covered? Every claim supported by the research it gathered? If a check fails, the loop runs again on the part that failed. If everything holds, it flips the work to done and hands off for review.

The loop is simple to describe and unforgiving to run, because every arrow between those three beats is a place state has to survive. The plan has to persist so the agent can return to it, the research so the ship step can cite it, the output so the reviewer can read it. Run the loop in a chat window and all three live in a scroll that vanishes when the tab closes. That failure mode is structural, not a matter of a smarter model.

Why multi-step work needs a durable surface

A chat transcript is the wrong container for an agentic workflow, and the reason is not subtle.

A multi-step agent produces state at every step: a plan, findings, partial drafts, a running sense of what is done. In a chat window it all lives in the conversation history, which means one place, one viewer, gone when the session ends, and impossible to hand to anyone else. The agent that ran overnight leaves nothing a teammate can pick up except a wall of scroll.

A durable surface gives the workflow's state a home outside the conversation. The plan is a real object other principals can read. The findings are rows or a doc that persist past the session. The output is a document with version history. Each task's status is a field anyone can read to know where the work stands without asking.

This is the same problem orchestration solves, and the relationship is worth stating precisely. Orchestration coordinates multiple agents across a workflow: who runs when, who picks up after whom, how handoffs happen. An agentic workflow is the unit of work being coordinated. You can run one with a single agent, and often do. But the moment the work spans more than one agent or session, it needs a shared surface, and coordinating those workflows is what agent orchestration does. The durable surface is what both stand on.

The teams getting this right stopped treating the agent's output as the deliverable and started treating the workspace state as the deliverable. The draft matters, but so does the plan that produced it, the research that backs it, and the trail of edits that shows how it got there. When the state is durable, all of it is legible after the fact. In a chat scroll, none of it is.

Self-checking is the step everyone skips

The difference between an agent that produces work and one you can trust is the check step, and it is the one most workflows drop.

An agent that plans, researches, and ships without checking its own output is fast and confident and wrong more often than you would like. It writes the section, moves on, and never rereads it against the goal. The claim that needs a source ships without one. The task the plan called for gets forgotten. The table keeps the duplicate the dedupe pass was supposed to catch.

The fix is to make self-checking an explicit step, not an afterthought. Before flipping a task to done, the agent rereads its output against the plan and the research: every task covered, every claim supported, every number traceable to a source it actually pulled. This is the agentic version of a test suite, grading its own work against a rubric it can see. The rubric is the plan it wrote at the start, and it works only because the agent checks against the written plan and stored research, not a fuzzy memory. The check is only as good as the state it checks against, one more reason that state has to be durable and not a scroll.

There is a limit to self-checking, and it is the point of the design. Some judgments an agent should not make alone, and some actions it should never take without a human. That is what a consent gate is for, and where the loop stops and asks.

How agentic workflows run in Dock

In Dock, an agentic workflow runs across a workspace, and the three beats of the loop map onto concrete surfaces. Dock is a shared cloud workspace where humans and AI agents read and write the same state in real time. Surfaces are typed tables (records) and docs (prose), and the plan, research, ship loop uses both.

The plan lives in a typed table as the task board. Each task is a row. A status column (todo, researching, drafting, review, done) is the workflow's state made visible. The agent reads the board to decide what is next and writes it as it makes progress. Anyone can open the workspace and see where the work stands, because the status column is the state, not a summary of it.

Research lands as rows or a doc in the same workspace. The agent writes findings where the ship step can read them: enriched records into a table, gathered notes into a doc. Because it is one workspace, the drafting step can requery what the gathering step produced with nothing carried between tools.

The output is a doc. Prose goes in a doc with version history; structured output goes in a table. The reviewer reads it on the same surface the agent wrote it on.

Handoffs happen by flipping status. The agent finishes drafting, sets the task's status to review, and moves on. A human or another agent watches for review and picks it up. No message passing, no copy-paste. Because every edit is attributed, the trail reads back as a real timeline of who did what and when. This handoff-by-status shape is the backbone of multi-agent orchestration in Dock.

Two properties make this safe to run unattended. Agents are first-class principals with their own API keys, not delegated human tokens, so every row the agent flips is signed by the agent, not laundered through a human account. And irreversible operations pause for human confirmation: a consent gate turns a dangerous call into propose, confirm, execute, so a workflow that loops at 3am cannot charge a card or delete a workspace without a human approving first. That gate is exactly where self-checking hands off to human judgment.

Because Dock works with any agent, cross-lab, the workflow is not tied to one model or framework. The workspace is the durable surface; the agent running the loop can be whatever you point at it.

FAQ

What is an agentic workflow?

An agentic workflow is an AI agent that plans, uses tools, acts, checks its own work, and iterates toward a goal, instead of returning a single answer to a single prompt. The agent decides the next step from the current state rather than following a script a human wrote ahead of time. The loop repeats until the goal is met or a question surfaces that needs a human.

How is an agentic workflow different from a scripted automation?

A scripted automation runs steps in a fixed order that a human wrote in advance, and it breaks when reality doesn't match the script. An agentic workflow is handed a goal and figures out the steps itself, changing course when the research finds a dead end or a check fails. The agent decides the next action; the script only replays the ones it was given.

Do agentic workflows need a human in the loop?

For most steps, no: reading sources, drafting, updating rows, and running checks all flow through unattended. But irreversible actions, anything that moves money, widens access, or can't be undone with a click, should pause for a human. A consent gate turns those into propose, confirm, execute, so a runaway loop cannot take a destructive action on its own.

Why can't an agentic workflow just run in a chat window?

A multi-step workflow produces state at every step: a plan, findings, partial drafts, a sense of what is done. In a chat window that state lives in one scroll, visible to one person, gone when the tab closes. A durable surface keeps the plan, research, and output as real objects that persist past the session and can be handed to a teammate or another agent.

What role does self-checking play?

Self-checking is the step that separates an agent that produces work from one you can trust. Before flipping a task to done, the agent rereads its output against the plan and the research: every task covered, every claim sourced. It is the agentic version of a test suite, and it only works when the plan and research are stored somewhere the agent can reread them.

How do agents hand work off in an agentic workflow?

Through shared workspace state, not chat messages between agents. An agent finishes a stage, flips a status field to review, and stops. A human or another agent watches for that status and picks the work up. Because every edit is attributed, the handoff leaves a clean, readable trail of who did what and when.

How to build an agentic workflow

If you are building an agentic workflow from scratch, this is the order that works. Run the steps in sequence; skipping one is the failure mode.

  1. Write the goal as a plan the agent can read. Not a prompt buried in a system message. A real object, a set of tasks with a status field, that the agent returns to after every action. Without it the agent re-derives its intent every turn and drifts.

  2. Give the agent tools that read and write real state. Search, fetch, query a table, write a doc. The tools are how the agent reaches outside its context to get what the task actually needs. An agent with no tools is a chat answer.

  3. Put the state on a durable surface, not in the conversation. The plan, research, and output all need to persist past the session and be readable by other principals. This is what separates a workflow that survives a handoff from one that dies when the tab closes.

  4. Make self-checking an explicit step. Before the agent flips a task to done, have it reread its output against the plan and the research: every task covered, every claim sourced. The check is a step in the loop, not a hope the first draft was right.

  5. Define the handoffs as status changes. When work is ready, the agent flips a status field and stops. A human or another agent watches for that status and picks it up. It needs no message passing and leaves an attributable trail.

  6. Gate the irreversible operations. Anything that moves money, widens access, or can't be undone with a click should pause for a human. The agent proposes, a human confirms, the action runs. This is the boundary where autonomy ends and human judgment begins, by design.

Where Dock fits

Dock is a shared cloud workspace where humans and AI agents read and write the same state in real time. Everything an agentic workflow needs, a durable place for the plan, the research, and the output, plus attribution on every edit and a consent gate on the dangerous steps, is what the workspace is built on.

You point your agent at a workspace. It reads the task board, does the research into the same workspace, writes the output as a doc, and flips status to hand off. You read what it did the same way you read what a teammate did, on the same surface, signed by the agent, time-stamped. The workflow runs across the workspace, and the workspace is the asset that outlives whichever model or framework you happen to be running this quarter.

If you are running multi-step agents and feeling them forget what they were doing between turns, that friction has a shape: the workflow has no durable surface to run on. Dock is free to start, Pro is $19/mo, and Scale is $49/mo. See pricing, or open a workspace from the home page and give an agent a goal.

Read next

Remix this into Dock

Make this yours. Edit, extend, run agents on it.

Sign in (free, 20 workspaces) — Dock mints a copy of this in your own workspace. The original stays untouched.

No Dock account? Sign-in is signup. Magic-link in 30 seconds.