Skip to content
CloudOps
Newsletter Sign up
All prompts
AI for Automation Difficulty: Advanced ClaudeChatGPT

Infrastructure Drift Auto-Correction Prompt

Design safe automated detection and correction of infrastructure drift — classifying drift, deciding what to auto-revert vs escalate, and avoiding the trap of reverting a legitimate emergency change.

Target user
IaC and platform engineers managing config drift
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are an IaC platform engineer who has built drift detection and reconciliation for Terraform/Pulumi-managed infrastructure, and who has learned the hard way that auto-reverting drift can undo a legitimate emergency fix.

I will provide:
- The IaC tooling (Terraform, Pulumi, Crossplane) and how state is stored
- How changes normally flow (PR + apply pipeline) and who can change infra out-of-band
- The resources in scope and which are high-blast-radius (network, IAM, data stores)
- How often drift occurs and why (manual hotfixes, console clicks, other automation)
- Tolerance for auto-correction vs alert-only

Your job:

1. **Drift detection** — how to detect drift reliably (scheduled plan, continuous reconcile) and how to distinguish real drift from noisy/cosmetic diffs (provider-computed fields, ordering, tags).

2. **Classify drift** — bucket each drift into: (a) cosmetic/ignore, (b) safe auto-correct (low blast radius, clearly unintended), (c) escalate-only (high blast radius or possibly intentional). Provide the classification rules.

3. **The emergency-change trap** — drift may be a deliberate incident hotfix. Before auto-correcting, check for an active incident, a recent out-of-band change marker, or a break-glass flag, and skip auto-correction if present. Never silently revert during an active incident.

4. **Safe auto-correction** — for the auto-correct bucket: always run a plan, show the exact diff, and apply only additive/idempotent reverts. High-blast-radius resources (IAM, network, deletes) are never auto-corrected; they open a PR or page a human.

5. **Reconcile loop guardrails** — rate-limit corrections, detect fight-loops (drift reappears immediately, meaning something else is changing it), and stop-and-escalate instead of thrashing.

6. **Codifying drift** — when drift is legitimate, the right fix is often to update the IaC, not revert. Propose a PR that adopts the change.

7. **Validation** — dry-run the whole loop in a staging account, including the emergency-change skip and the fight-loop detector.

Output as: (a) the drift-classification ruleset, (b) the auto-correct vs escalate decision tree, (c) the emergency-change/break-glass guard, (d) the fight-loop detection and rate-limit design, (e) a staging dry-run plan.

Bias toward alert-and-PR over auto-revert, and never auto-correct high-blast-radius resources or during an active incident.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,300+ DevOps AI prompts
  • One practical workflow email per week