---
title: "AI contract risk scoring in 2026: workflows the GC and the auditors trust"
excerpt: "Risk scoring without rationale gets ignored; risk scoring without thresholds floods the queue. The workflow that works: the agent scores against named risk factors (liability cap, indemnification, governing law, data-protection terms), the attorney reviews scoring drift, the audit trail survives external review."
author: mei
category: Playbooks
date: "2026-05-30"
---

A contract risk score is only useful if a GC can defend it and an auditor can reconstruct it. Every score has to point to a named risk factor, a clause excerpt, a model version, and a human reviewer. [Evisort](/blog/ai-evisort-contracts), Ironclad AI, LinkSquares, and Spellbook produce the numbers. The workflow around them makes them trustworthy. This is the scoring playbook inside the broader [legal review with AI](/blog/how-to-do-legal-review-with-ai) approach.

## The five-step risk-scoring workflow

**1. Define the risk factor list first.** GCs settle on factors: liability cap, mutual indemnification, governing law, data-protection, termination for convenience, auto-renewal, IP assignment. Each gets a written threshold. Spellbook and Ironclad AI encode playbook positions; LinkSquares stores them as policy rules. Without this list, every score is a vibe.

**2. Score against clauses, not documents.** A document-level "high risk" label hides the cause. The agent extracts the clause, names the factor, and assigns a 1-5 severity. Evisort and LinkSquares expose clause-level scoring. ChatGPT and Claude can do the same with a structured prompt that names the factors and demands clause citations.

**3. Show the rationale next to the score.** A score of 4 for "uncapped indemnification" is useful. A 4 with no quoted text is noise. Output should always include the clause excerpt, the factor matched, the threshold breached, and the model version. This is the part that survives audit.

**4. Route by threshold.** Scores at or above the GC's escalation threshold go to a senior reviewer. Scores below get fast-lane sign-off. [Harvey](/blog/ai-harvey-legal) and Robin AI support tiered routing. Without thresholds the queue floods and reviewers stop trusting the system.

**5. Log the override.** When an attorney accepts a high-risk clause, the reason gets recorded. The NIST AI Risk Management Framework calls this "documented human oversight" and it is the single most important habit for surviving audit ([NIST AI RMF](https://www.nist.gov/itl/ai-risk-management-framework)).

## Worked example: vendor MSA with a 4.6 risk score

A SaaS vendor sends an MSA. Evisort flags three clauses: liability capped at fees paid in the prior three months (liability cap, severity 4), one-way indemnification favoring the vendor (mutual indemnification, severity 5), and a DPA with no named subprocessors (data-protection, severity 4). Composite score: 4.6. The senior commercial attorney accepts the liability cap after negotiation, redlines the indemnification, and requires a subprocessor list. Each decision and its rationale get logged.

## Where scoring workflows break: the rationale layer

Six months later the auditor asks why a 4.6 deal was approved. Clauses live in Ironclad or DocuSign CLM. Redlines live in Word. The Slack thread where the attorney explained the override has aged out. The score is reproducible but the reasoning is gone. ACC benchmarks flag this as a top legal-ops risk ([ACC resource library](https://www.acc.com/resource-library)).

One way to solve this is a workspace like Dock that holds the rationale, the threshold breached, the override reason, the approver chain, and a pointer back to the CLM record (`ironclad_workflow_id` or `docusign_envelope_id`). The CLM stays system of record; Dock holds what the agent interpreted. Signature still passes through a two-key handshake, which the [dangerous-ops contract](/blog/dangerous-ops-contract) covers, and reasoning lands in the [agent audit log](/blog/agent-audit-and-compliance) the GC can replay.

## Why it matters

Risk scores that cannot be reconstructed get ignored by GCs and rejected by auditors. The discipline is boring: name the factors, store the rationale, log the overrides. Tooling matters less than the workflow.

For an in-house team, [Dock for Legal](/blog/dock-for-legal) walks through the full playbook.

## FAQ

**Q: Should risk scoring replace attorney review?**
No. Scoring routes work to the right reviewer and surfaces clauses that matter. The attorney still makes the call.

**Q: How do I prevent scoring drift between contracts?**
Pin the model version and the factor list per run. When you update either, re-score a sample and compare.

**Q: Can ChatGPT or Claude do this without a CLM?**
For low volume, yes, with a structured prompt demanding clause citations and factor names. For high volume or audit exposure, Evisort or LinkSquares is more defensible.

**Q: What do auditors ask for?**
The factor list in effect, the model version, the clause excerpt, the score, the reviewer name, and the override reason. With those fields per contract, you survive most external reviews.
