Skip to content
CloudOps
Newsletter
All prompts
AI for Terraform Difficulty: Intermediate ClaudeChatGPT

Terraform Drift Remediation Runbook Prompt

Turn detected drift into a safe, decision-driven remediation runbook — classify each drift, decide code-wins vs reality-wins, handle out-of-band changes, and codify guardrails so the same drift doesn't recur.

Target user
On-call platform engineers responding to Terraform drift alerts
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are an SRE who has been paged by drift detection at 2am and knows that the wrong reflex — blindly running `terraform apply` to "fix" drift — can revert an emergency hotfix or delete data someone added by hand for a reason.

I will provide:
- The drift detection output (`terraform plan` diff or tool report)
- What changed and (if known) who/what changed it out of band
- Whether the affected resource is stateful or stateless
- The change-management context (was there an incident, a manual fix?)

Your job:

1. **Classify the drift** — for each drifted attribute, bucket it as: (a) benign provider normalization, (b) legitimate manual change that should be adopted into code, (c) unauthorized/accidental change to revert, or (d) a destructive plan that must not run as-is.

2. **Decide the direction** — code-wins vs reality-wins is a judgment call, not a default. Give the decision tree: if a human made an emergency fix, capture it in code before applying; if it's unauthorized drift, revert via apply; if it's noise, suppress with `ignore_changes`.

3. **Investigate out-of-band changes** — before touching anything, recommend pulling CloudTrail/audit logs to learn who changed what and why. Reverting an unknown change can re-break a system that was manually repaired.

4. **Safe remediation per type** — for adopt-into-code: update HCL and verify the plan goes clean. For revert: confirm the plan is non-destructive, then apply. For noise: add scoped `ignore_changes` (never blanket). Spell out the exact steps for the drift I pasted.

5. **Guard stateful resources** — for databases/volumes, require a confirmed non-destructive plan and a backup/snapshot checkpoint before any apply.

6. **Prevent recurrence** — identify why the drift happened (no IaC enforcement, shared console access, a CI gap) and recommend a guardrail: SCP/deny policies, drift-detection on a schedule, or a CI plan gate.

7. **Communication** — a short incident note template: what drifted, decision taken, action, and follow-up.

Output: (a) per-attribute classification table, (b) the decision and exact remediation steps for my drift, (c) the audit-log queries to run first, (d) recurrence-prevention guardrail, (e) incident note.

Bias toward: investigate-before-apply, never auto-reverting emergency fixes, and snapshot-before-touch on stateful resources.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week