Reconciliation Drift Detection Loop Design Prompt
Design a reconciliation loop that continuously compares desired state to observed state, reports drift, and converges it — with rate limits and a freeze switch so the loop can't amplify a bad desired state.
- Target user
- Platform engineers building self-correcting infrastructure automation
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior platform engineer who has watched a reconciliation loop confidently converge an entire fleet to a broken desired state. I will provide: - The resource being reconciled and where desired vs. observed state live - How drift is detected (poll interval, watch, event) and what "converged" means - The actions the loop takes to converge, and their reversibility - Scale: how many resources, how fast drift appears, blast radius of a wrong action Your job: 1. **State model** — define the desired-state source of truth, how observed state is read, and the exact diff that constitutes drift for [RESOURCE], avoiding false drift from fields the loop doesn't own. 2. **Detect–diff–act cycle** — structure the loop so each iteration reads observed state fresh, computes drift, and acts only on a real, non-empty diff; make the act step idempotent. 3. **Rate limiting and backoff** — bound how often the loop acts and add exponential backoff on repeated failures, so a resource that won't converge isn't hammered. 4. **Anti-amplification guards** — cap how many resources the loop will change per cycle and halt if drift exceeds a threshold, since mass drift usually means the desired state is wrong, not reality. 5. **Freeze switch** — provide a manual pause that stops convergence without stopping detection, so engineers can investigate while still seeing drift reports. 6. **Observability** — emit drift count, convergence latency, and per-resource action outcomes; alert when a resource fails to converge after N cycles. Output as: a control-loop diagram, the detect-diff-act pseudocode, a guardrail config table, and a runbook for when the loop is frozen. Run the loop in observe-only (report drift, take no action) against production for a full cycle before enabling convergence; a loop that acts on a poisoned desired state propagates the mistake faster than any human could.
Why this prompt works
A reconciliation loop is the engine behind self-healing infrastructure, and its great strength is also its great danger: it converges relentlessly toward the desired state, whether or not the desired state is correct. The prompt front-loads this with anti-amplification guards, because the failure mode that hurts most is not a loop that fails to act — it’s a loop that acts perfectly on a poisoned input. When a bad config push makes most of the fleet look “drifted,” an uncapped loop will faithfully change every resource to match the broken desired state, faster than any human could intervene. Capping actions per cycle and halting on mass drift turns that catastrophe into an alert.
The prompt also enforces the discipline that separates a real control loop from a script on a timer. Each iteration must read observed state fresh, compute a genuine diff, and act only when the diff is non-empty and the action is idempotent. Skip any of these and you get flapping: the loop acts on stale state, the resource looks wrong again next cycle, and the loop fights itself indefinitely. Rate limiting and backoff prevent a resource that simply can’t converge from being hammered every cycle, which both wastes work and obscures the signal that something needs human attention.
Crucially, the prompt separates detection from convergence via a freeze switch. When something goes wrong, the instinct is to kill the loop entirely — but that also kills your visibility into the drift you’re trying to diagnose. Keeping detection running while convergence is paused lets engineers investigate with full information. The model can draft the loop structure and guards quickly, but you verify by running it observe-only against production for a full cycle first, confirming the drift it reports is real before you ever let it act.
Related prompts
-
Infrastructure Drift Auto-Correction Prompt
Design safe automated detection and correction of infrastructure drift — classifying drift, deciding what to auto-revert vs escalate, and avoiding the trap of reverting a legitimate emergency change.
-
Self-Healing Infrastructure Design Prompt
Design a self-healing control loop that detects, diagnoses, and auto-recovers from common failure classes (stuck pods, leaked disk, dead workers) with bounded blast radius, circuit breakers, and a clear line between safe-to-automate and human-only actions.