Dock for DevOps is a workspace where an agent reads CI status, deploy state, and telemetry, then writes a deploy-readiness check, an on-call runbook, or a post-mortem draft into a row an engineer signs off on. Pipelines own the artifacts. Dock owns the interpretation: the gate verdict, the runbook the on-call followed at 02:14, the RCA the incident lead approved. Every row carries the run ID, agent identity, reviewer, and timestamp.
The architecture
GitHub Actions, ArgoCD, Datadog, PagerDuty, and Terraform Cloud stay the system of record for raw pipeline and telemetry data. Dock is the system of record for what the agent interprets: the deploy gate, the runbook decision, the sign-off, the audit log. Each row carries a pointer back to the platform record (github_run_id, argocd_application), agent identity, reviewer, and timestamp. The agent re-fetches state via fresh API reads when it needs current data. Dock holds the interpretive layer that survives sessions, rotations, and incident reviews. This is the Cloud 2.0 pattern for engineering: platforms hold facts, Dock holds judgments.
The deploy-readiness table
| run_id | service | gate_verdict | risk_signals | reviewer | status |
|---|---|---|---|---|---|
| gh_run_9821 | checkout-api | hold | error budget 92%, schema touches orders |
sre-priya | approved-with-canary |
| gh_run_9824 | search-indexer | ship | green CI, p95 stable 14d | sre-jordan | approved |
| gh_run_9826 | billing-worker | block | SLO burn 4x, Terraform adds IAM role | sre-priya | rolled-back |
gate_verdict is the agent's call. risk_signals justifies it. reviewer and status are the human decision against the draft. Each row points back to the GitHub run and ArgoCD app so a SOC 2 audit six months later can trace the decision.
A worked example
A PR merges to main at 14:02. CI finishes at 14:08. The agent reads the GitHub Actions run, pulls the Terraform Cloud plan, queries Datadog for SLO burn on billing-worker, and checks ArgoCD sync. It drafts: verdict hold, signals SLO burn 4x, IAM role added, no canary config. The on-call SRE opens the row at 14:11, sees the reasoning with pointers, and approves a canary, blocks, or overrides in writing. The override is captured. Production deploys are an irreversible operation requiring the two-key handshake: the agent proposes, the human commits.
Why this matters
Runbook drift is the quiet killer. The wiki has the 2024 playbook; production was rewired three quarters ago. The agent re-fetches live state, so the runbook reflects the system at 02:14, not at the last quarterly review. That is the dangerous-ops contract applied to incident response.
Attribution survives the rotation. At the Friday retro, the row shows who drafted, who approved, what signals were on the table, and what the override said. The next on-call inherits a record, not tribal memory. This is the foundation for agent audit and compliance under SOC 2 and ISO 27001. The agent runs on scoped credentials: read CI and telemetry, write Dock rows, never push to main. See agent identity.
Try Dock for your DevOps team or read the adjacent Dock for IT operations.
FAQ
Does Dock replace GitHub Actions, ArgoCD, or Datadog? No. Pipeline tools stay the source of truth. Dock holds the agent's interpretation and the human decision.
How does this handle deploys to regulated workloads? The agent drafts the verdict and signals. The reviewer approves, blocks, or overrides in writing. Both are logged, satisfying SOC 2 CC8.1 and ISO 27001 audit trails.
What does the agent re-fetch versus what does Dock store? The agent re-fetches live CI, Terraform plans, Datadog SLO burn, and ArgoCD sync every evaluation. Dock stores the verdict, signals, decision, and pointers. DORA research finds elite performers ground decisions in live telemetry.
How does this compare to a runbook wiki? A wiki captures intent at write time and goes stale. A Dock row captures live state at decision time and preserves the sign-off. The Google SRE book treats post-mortems as learning artifacts; Dock rows make them queryable across rotations, aligned with the CNCF OpenGitOps principle that declarative, versioned state is the operational baseline.
Sources
- DORA, State of DevOps research
- Beyer, Jones, Petoff, Murphy, eds. Site Reliability Engineering, O'Reilly.
- CNCF, OpenGitOps project