Dock
Sign in & remix
REMIX PREVIEWThinking· MAY 2

Why we kept flat pricing while every other AI tool went per-token

The whole AI category went usage-based in 2025. We didn't. Here's the reasoning, the math, and why we think the per-token model is the wrong fit for agent-shaped work.

By argus· 8 min read· from trydock.ai

In the eighteen months between November 2024 and May 2026, almost every AI tool worth naming switched to usage-based billing. Anthropic's API was always per-token. OpenAI's API was always per-token. Then Cursor went usage-based for premium models. Then GitHub Copilot added per-completion pricing on top of the seat fee. Then Vercel introduced AI compute billing. Cloudflare's Workers AI is per-request. Replit's agent is per-task.

The pattern is so consistent that it stopped being a debate. AI is expensive, AI usage is unpredictable, ergo AI billing has to be usage-based. Every founder pitching investors started with "we have predictable margins because we pass through usage costs."

Dock didn't follow. Free is $0, Pro is $19/month, Scale is $49/month. Flat per organization. You can spawn three agents on Free. You can spawn ten on Pro. You can spawn thirty on Scale. The bill doesn't change.

This piece is about why we made that decision, why we're still making it, and the structural reasons we think flat is the right shape for agent-shaped work even when every other AI tool went the other direction.

The economic case for per-token

The argument for usage-based billing in AI is genuinely strong. Token costs are real. They scale with usage. If you charge a flat fee and your power users burn 100x the tokens of your median user, you either subsidize the heavy users from the light ones (which feels unfair to the light ones) or you set the flat fee high enough to cover the heavy users (which feels expensive to everyone).

This is the same calculation that pushed AWS to per-second billing, that pushed Snowflake to per-query, that pushed every modern infrastructure product to consumption pricing. It works. It aligns the vendor's incentives with the customer's actual usage. It scales linearly without surprises.

The reason almost every AI tool adopted it is that LLM tokens behave like compute: variable cost, hard to predict, hard to cap.

Why it doesn't fit agent-shaped work

The case starts to wobble when the LLM isn't the product. When the product is something the LLM happens to use.

Dock isn't an LLM wrapper. Dock is a workspace. The LLM tokens are spent by your agent (in your runtime, against your Anthropic / OpenAI / whatever account). When your agent calls Dock's MCP server to append a row, no tokens are consumed on Dock's side. We process a JSON-RPC request, validate a Zod schema, write a Postgres row, fire a webhook. The unit cost is a few milliseconds of compute and a few KB of database storage. It does not scale with the size or complexity of the agent's prompt.

The infrastructure cost we have is bounded by the number of API calls + the size of the data + the number of webhook deliveries. We can predict it pretty well per plan tier, which is exactly what the cap structure encodes (Free 10K API/mo, Pro 100K, Scale 1M).

So we don't have a per-token cost to pass through. What we have is a fixed-cost-per-org infrastructure footprint. Flat pricing is the natural shape for that.

But there's a deeper reason flat is right for our category, and it took us a few months to articulate.

Per-agent pricing punishes experimentation

The way agent builders actually work in 2026 is fundamentally different from how SaaS users worked in 2020. A SaaS user picks a tool, learns it, uses it daily. An agent builder spawns a Researcher agent to test a prompt, forgets about it for two days, comes back, spawns a Writer agent to draft a thing, has both running, decides the Researcher prompt is wrong, kills it, spawns a different one with a new prompt. The agent count fluctuates wildly inside one human's workflow.

We watched this in our own dogfooding. Govind (our founder) goes from two agents to seven over the course of a Tuesday afternoon and back down to three by Wednesday. Mike (our design partner) similarly.

If we charged per-agent, every one of those experimental spins would show up on the bill. The economic incentive becomes "run fewer agents." The right incentive should be the opposite: "run as many agents as the work justifies, kill the ones that don't pay off, learn what works."

Per-token has the same problem at finer grain. Every prompt-iteration cycle ("I tweaked the system prompt, let me re-run the same task to see how it changed") shows up as a charge. The economic friction discourages the iteration that makes the agent better.

Flat pricing flips both incentives. Spawn agents. Iterate prompts. Kill the dud and try another. The bill stays $19. Your finance team has a fixed line item. Your engineering team is encouraged to experiment.

What flat actually costs us

Here's the trade-off we accept. Some teams will use a hundred times more API capacity than the median, and we'll absorb the cost. The way we keep the math working is the cap structure: you can't sustain 100x usage without bumping into the monthly API call cap or the workspace count cap. When you hit the cap, your agent gets a 402 with a recommendation to either upgrade or request a custom limit raise. Most teams that legitimately need 100x usage are happy to move to Scale, because they're getting a predictable price increase aligned with their predictable usage growth.

The teams we deliberately don't optimize for are the bursty enterprise users who occasionally need 1000x usage but only for a few hours per quarter. Those teams genuinely need usage-based billing, and they should use one of the platforms that offers it. Our flat-fee model isn't right for them, and we say so up front.

For everyone else (which is most agent builders we talk to), flat pricing matches their actual usage shape better than per-token would.

The other thing flat pricing buys

There's a non-economic reason we like flat that's worth naming explicitly. It's a clarity-of-intent signal.

When pricing is per-token, the conversation between you and the vendor is about consumption. How much are you using? Can you optimize? What are the cost spikes? The vendor's customer-success motion is "help you use less."

When pricing is flat, the conversation is about value. Are you getting your $19 worth? Should you upgrade? What features unlock at the next tier? The vendor's customer-success motion is "help you use more."

For a workspace product where the primary value comes from the workspace being well-populated and well-collaborated-on, the second conversation is the right one. We want you to put more agents on your workspace. We want you to spawn the experimental ones. We want you to have a Researcher and a Writer and an Editor all working on the same launch plan, even if only the Writer's output ships. The flat fee is a vote of confidence that all that activity is welcome.

The day we have to start charging per-agent is the day we'd start telling power users to run fewer agents. That's the day we'd be optimizing the wrong thing.

When flat pricing breaks

We've thought hard about when this stops working. Three scenarios.

Scenario one: someone abuses the flat tier with extreme usage. This is the obvious risk and the cap structure handles it. You cap-out, you get the 402, you upgrade or you talk to us. We've never had to ban an account for over-use; the caps are enough.

Scenario two: token costs balloon to the point that flat math doesn't work. This is the structural risk. If LLM token prices doubled tomorrow, our customers' costs would double, but ours wouldn't (we don't pay for their tokens). So this scenario doesn't actually break our model; it breaks theirs. We're insulated by being a workspace, not an LLM wrapper.

Scenario three: enterprise customers want predictable per-seat billing. This is the demand-side pressure that's pushed many flat-priced startups to add per-seat tiers. We accept this won't fit every enterprise procurement process and that's a real cost we pay. The teams that pick us pick us in part because we're not per-seat.

What we'd change if we were starting over

Honestly, nothing about the pricing model. Two things about how we communicate it.

First, we should have led with the contrarian framing earlier. For most of 2025 we had pricing pages that just listed Free/Pro/Scale without explaining why we're different. Customers were comparing us to per-token tools and assuming our pricing was a "haven't gotten around to per-token yet" placeholder. It's not. It's a deliberate choice.

Second, we should have shipped the "what counts as an agent" explainer earlier. People asked us "wait, can I really spawn 30 agents on Scale and not pay more?" and we'd answer in a Slack message instead of just having the answer on the pricing page.

Both of those are addressable in copy. The pricing model itself feels right two years in.

What this means for you

If you're picking a workspace for agent-shaped work, the question isn't just "what's the per-month price." It's "does the pricing align with the way I want my team to use agents?"

Per-token is right when your value scales with the LLM directly (chat assistants, code completion, content generation services).

Flat is right when your value scales with the workspace (project tracking, collaborative documents, structured ingest, multi-agent coordination). The agents do the work; the workspace is where the work lives.

We bet that for agent-shaped work, flat is the right primitive. Two years of dogfooding and a few thousand orgs in, we're still betting that. Anthropic, OpenAI, and Google moved the other way. We didn't. The numbers say we don't have to.

Remix this into Dock

Make this yours. Edit, extend, run agents on it.

Sign in (free, 20 workspaces) — Dock mints a copy of this in your own workspace. The original stays untouched.

Sign in & remix

No Dock account? Sign-in is signup. Magic-link in 30 seconds.