Skip to content
DevOps AI ToolKit
Newsletter
All prompts
AI for Automation Difficulty: Advanced ClaudeChatGPTCursor

Reconciliation Drift Detection Loop Design Prompt

Design a reconciliation loop that continuously compares desired state to observed state, reports drift, and converges it — with rate limits and a freeze switch so the loop can't amplify a bad desired state.

Target user
Platform engineers building self-correcting infrastructure automation
Difficulty
Advanced
Tools
Claude, ChatGPT, Cursor

The prompt

You are a senior platform engineer who has watched a reconciliation loop confidently converge an entire fleet to a broken desired state.

I will provide:
- The resource being reconciled and where desired vs. observed state live
- How drift is detected (poll interval, watch, event) and what "converged" means
- The actions the loop takes to converge, and their reversibility
- Scale: how many resources, how fast drift appears, blast radius of a wrong action

Your job:

1. **State model** — define the desired-state source of truth, how observed state is read, and the exact diff that constitutes drift for [RESOURCE], avoiding false drift from fields the loop doesn't own.
2. **Detect–diff–act cycle** — structure the loop so each iteration reads observed state fresh, computes drift, and acts only on a real, non-empty diff; make the act step idempotent.
3. **Rate limiting and backoff** — bound how often the loop acts and add exponential backoff on repeated failures, so a resource that won't converge isn't hammered.
4. **Anti-amplification guards** — cap how many resources the loop will change per cycle and halt if drift exceeds a threshold, since mass drift usually means the desired state is wrong, not reality.
5. **Freeze switch** — provide a manual pause that stops convergence without stopping detection, so engineers can investigate while still seeing drift reports.
6. **Observability** — emit drift count, convergence latency, and per-resource action outcomes; alert when a resource fails to converge after N cycles.

Output as: a control-loop diagram, the detect-diff-act pseudocode, a guardrail config table, and a runbook for when the loop is frozen.

Run the loop in observe-only (report drift, take no action) against production for a full cycle before enabling convergence; a loop that acts on a poisoned desired state propagates the mistake faster than any human could.

Why this prompt works

A reconciliation loop is the engine behind self-healing infrastructure, and its great strength is also its great danger: it converges relentlessly toward the desired state, whether or not the desired state is correct. The prompt front-loads this with anti-amplification guards, because the failure mode that hurts most is not a loop that fails to act — it’s a loop that acts perfectly on a poisoned input. When a bad config push makes most of the fleet look “drifted,” an uncapped loop will faithfully change every resource to match the broken desired state, faster than any human could intervene. Capping actions per cycle and halting on mass drift turns that catastrophe into an alert.

The prompt also enforces the discipline that separates a real control loop from a script on a timer. Each iteration must read observed state fresh, compute a genuine diff, and act only when the diff is non-empty and the action is idempotent. Skip any of these and you get flapping: the loop acts on stale state, the resource looks wrong again next cycle, and the loop fights itself indefinitely. Rate limiting and backoff prevent a resource that simply can’t converge from being hammered every cycle, which both wastes work and obscures the signal that something needs human attention.

Crucially, the prompt separates detection from convergence via a freeze switch. When something goes wrong, the instinct is to kill the loop entirely — but that also kills your visibility into the drift you’re trying to diagnose. Keeping detection running while convergence is paused lets engineers investigate with full information. The model can draft the loop structure and guards quickly, but you verify by running it observe-only against production for a full cycle first, confirming the drift it reports is real before you ever let it act.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week