AI response drafting works when the draft is grounded in three things: the customer's prior conversation, your brand voice guidelines, and your existing response library. A raw Claude or ChatGPT draft sounds like a different company. A draft pulled from Zendesk macros, scoped to the customer's last six tickets, and constrained by a voice guide reads like your senior agent on a calm day. This piece is the workflow that gets you there, and the feedback loop that keeps it honest.
The drafting workflow, step by step
1. Pull the customer context before the model writes anything. In Zendesk, that means the ticket thread plus the user's last 90 days of conversations. In Intercom, pull the conversation, the user attributes, and any linked company notes. Help Scout's API exposes the same shape. Skip this step and the draft answers the wrong customer.
2. Constrain the model with a written voice guide. Two pages, declarative. Borrow the structure from the Nielsen Norman tone-of-voice dimensions: formal vs. casual, serious vs. funny, respectful vs. irreverent, matter-of-fact vs. enthusiastic. Pick your position on each axis and write three sentence-level examples per axis. The Mailchimp voice and tone guide is the canonical reference for what good looks like.
3. Ground the draft in your response library. Forethought and Ada both ship retrieval over your existing macros and articles. If you are on Help Scout or Intercom without one of those layered on, a Claude or ChatGPT call with the top three matching saved replies in the prompt does the same job. The macro generation workflow covers how to keep that library clean.
4. Show the draft inline, never auto-send. The agent edits in place. The edit is the signal. See the helpdesk-specific patterns in the Help Scout AI workflow and the Intercom AI workflow.
5. Capture the edit as a voice example. This is the loop most teams skip. If the agent rewrote "I understand your frustration" to "That's a fair complaint," that pair is a voice example. Feed the next 50 of those back into the system prompt.
Worked example: a delayed-shipping reply
A Gorgias ticket lands: "Where is my order, it's been 11 days." Forethought retrieves the three closest macros. Claude drafts: "Hi Sam, I'm so sorry for the delay! Let me look into this right away." The agent rewrites: "Hi Sam, eleven days is too long. Tracking shows it cleared customs yesterday, delivery Thursday. Refunding the shipping fee now." That edit is the voice example. The apologetic-hedge pattern is wrong for this brand. The next draft on the next late-shipment ticket should not open with "I'm so sorry."
Where this breaks: voice drift and no feedback loop
The draft is fine. The edit is fine. The next draft does not learn. Most helpdesks store the final reply but not the rewrite delta, and not the reasoning the agent used. So next week a different agent gets the same apologetic draft, edits it slightly differently, and your outbound voice drifts. One way to solve this is a workspace like Dock that holds the rewrite delta, the voice rationale, and the chosen pattern as a row, with a zendesk_ticket_id pointer back to the ticket. Zendesk stays the system of record for the conversation. Dock holds what the agent interpreted around it. The Dock for customer support overview covers the structure, and agent identity covers attribution on the rows.
Why voice matching matters
Customers do not read your style guide. They feel inconsistency. Three agents writing in three voices reads as three companies. A drafting workflow grounded in customer context, a written guide, and a feedback loop on edits is how you keep one voice across forty agents and a model that does not remember yesterday.
Read the full customer support with AI playbook.
FAQ
Should AI draft replies auto-send? No. Auto-send removes the edit signal, which is the only training data that matters for voice. Draft inline, human edits, then send.
Which LLM is best for support drafting? Claude and ChatGPT both work. The differentiator is the retrieval layer and the voice guide, not the base model. Forethought and Ada package both; Help Scout and Intercom expose APIs you can wire to either.
How long should a brand voice guide be? Two pages. Four tone axes, three sentence examples per axis, ten phrases to avoid. Longer guides do not get used.
How do we measure voice consistency? Sample 20 sent replies per agent per week and score them against the voice guide axes. Drift shows up within three weeks if you skip the feedback loop.