Dry-Run and Simulation: Test Automation Before It Touches

The first time automation bit me, it was a one-line Ansible change that looked harmless and rewrote the resolv.conf on forty hosts. It applied in eight seconds. The rollback took two hours. Ever since, I’ve held a simple rule: no automated action is allowed to touch production until it has already proven, on paper, exactly what it intends to do. Dry-run is not a nicety. It is the contract that lets me sleep while a machine does my job, because the machine has to show its work before it gets to act.

The pattern I’ll walk through here is promote-on-evidence: an action describes its effect, a diff gets reviewed, a replica runs it for real, and only then — with a human pressing the button — does it touch prod. AI fits into this beautifully, but only in the read-and-explain seat. It never holds the credentials.

Make every tool run in —check mode first

Most ops tooling already ships with a dry-run flag. The job is to make it mandatory, not optional. Ansible has --check, which walks the play and reports what would change without changing it, and --diff, which shows the before/after of files and templates.

# Predict, don't apply. --diff makes the changes legible.
ansible-playbook site.yml \
  --check --diff \
  --limit 'web_canary' \
  --tags 'nginx' \
  > /tmp/check.out 2>&1

The --limit here matters as much as --check. Blast-radius scoping is part of the dry-run: I’m not even simulating against the whole fleet, I’m simulating against a single canary host. If the check output is clean against web_canary, I widen the limit. If it’s noisy, I never typed the words “all hosts.”

Pro Tip: A clean --check run is not proof of correctness — modules that can’t predict their effect (shell, command) report “changed” or nothing at all. Treat any command/shell task in a check run as an unknown, and wrap it with changed_when and a real check-mode guard.

terraform plan is a contract — parse it like one

terraform plan is the gold standard of dry-run because the diff is machine-readable. Don’t eyeball the human-colored output for anything that matters. Emit JSON and parse it.

terraform plan -out=tfplan -lock-timeout=60s
terraform show -json tfplan > plan.json

Now you have a structured artifact you can gate on. Here’s the check I run before any plan is allowed to proceed to apply — it counts deletes and replacements, because those are the actions that lose data:

import json, sys

with open("plan.json") as f:
    plan = json.load(f)

destroy, replace = [], []
for rc in plan.get("resource_changes", []):
    actions = rc["change"]["actions"]
    if actions == ["delete"]:
        destroy.append(rc["address"])
    if "delete" in actions and "create" in actions:  # replace
        replace.append(rc["address"])

if destroy or replace:
    print("DESTRUCTIVE PLAN — human approval required")
    for a in destroy: print(f"  destroy:  {a}")
    for a in replace: print(f"  replace:  {a}")
    sys.exit(2)   # non-zero blocks the pipeline
print("plan is additive/in-place only — safe to auto-promote")

A pure create/update plan can flow through with light review. The moment a delete or a replace shows up, the pipeline stops and a human has to look. That exit code is the gate. I cover the broader pattern in confidence-gated auto-remediation, but the principle is the same: the risky actions are a different class.

kubectl —dry-run=server: validate against the real API

Client-side dry-run only checks YAML syntax. The useful one is server-side — it sends the object to the API server, runs it through admission controllers, validation webhooks, and defaulting, then throws it away. You find out it would be rejected before you apply it.

# Server-side: real validation, no mutation.
kubectl apply -f deployment.yaml \
  --dry-run=server \
  --validate=true \
  -o yaml > predicted.yaml

# Diff predicted state against what is live right now.
kubectl diff -f deployment.yaml

kubectl diff is the unsung hero. It shows you the delta between your manifest and the running object — not what you think is deployed, but what actually is. That gap is where drift hides. A manifest that looks like a no-op often isn’t, because someone hand-patched the live resource at 3am and never told the repo.

Diff predicted state against current — and have AI summarize it

A raw diff is honest but exhausting. A 400-line terraform plan against forty resources is technically complete and practically unreadable, which is how a destructive change sneaks past a tired reviewer. This is where AI earns its seat: not deciding, summarizing. Feed it the structured diff and ask for a plain-English risk summary that a human then reads.

import json, anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY, NOT cloud creds

with open("plan.json") as f:
    plan = json.load(f)

# Send only the diff, never credentials or state secrets.
changes = [
    {"address": rc["address"], "actions": rc["change"]["actions"]}
    for rc in plan.get("resource_changes", [])
    if rc["change"]["actions"] != ["no-op"]
]

resp = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    system=(
        "You are a release reviewer. Summarize this Terraform plan diff. "
        "List destructive or stateful changes FIRST and flag them clearly. "
        "Do not approve anything. End with: HUMAN REVIEW REQUIRED."
    ),
    messages=[{"role": "user", "content": json.dumps(changes)}],
)
print(resp.content[0].text)

The model is a fast junior engineer here. It reads the diff faster than I do and writes a tidy summary: “3 in-place updates to tags, 1 replacement of aws_db_instance.primary (this destroys and recreates the database — data loss unless a snapshot exists).” That last sentence is the whole point. The model surfaced the landmine; a human decides what to do about it. I keep these review prompts in a shared library — see /prompts/ and the prompt workspace for the templates I actually use.

Pro Tip: Never let the AI’s summary be the gate. The deterministic Python check (counting deletes) is the gate; the AI summary is a reading aid for the human standing at the gate. If the model and the parser disagree, the parser wins.

Promote through a staging replica, not straight to prod

A dry-run predicts; a replica proves. The strongest evidence isn’t a plan — it’s the same change applied for real against an environment that looks like prod. So the promotion ladder is: --check on a canary, parse the plan, apply to a staging replica, observe, then promote.

# .github/workflows/promote.yml — evidence gates between stages
jobs:
  predict:
    steps:
      - run: terraform plan -out=tfplan && terraform show -json tfplan > plan.json
      - run: python gate_destructive.py plan.json   # exits non-zero on delete/replace

  staging-prove:
    needs: predict
    steps:
      - run: terraform apply -auto-approve tfplan   # real apply, REPLICA workspace
      - run: ./smoke_tests.sh https://staging.internal

  prod-apply:
    needs: staging-prove
    environment:
      name: production          # GitHub required-reviewers gate lives here
    steps:
      - run: terraform apply tfplan   # NO -auto-approve; a human approved the env

The environment: production block ties the real apply to a required human reviewer in GitHub. Staging ran the change for real and the smoke tests passed — that’s the evidence. A person looks at the evidence and approves the prod run. The machine never promotes itself.

Always carry the back-out path with you

Dry-run tells you what will happen. The back-out tells you what to do when it happens differently than predicted — because eventually it will. Generate the rollback artifact before you apply, not after the pager goes off.

# Snapshot the current state so you can restore it.
kubectl get deployment api -o yaml > rollback/api.$(date +%s).yaml

# Terraform: the previous plan/state is your back-out. Keep it.
cp terraform.tfstate "rollback/tfstate.$(date +%s)"

# The one-liner you want to have ready, not have to invent:
echo "kubectl rollout undo deployment/api" > rollback/UNDO.sh

If the apply goes sideways, recovery is a known command sitting in a file, not improvisation under pressure. This is the same discipline that makes self-healing infrastructure trustworthy: every automated forward action has a defined reverse. You’ll find more of this thinking across the automation category.

Dry-run, diff, simulate, then promote on evidence — that ladder is what separates automation you trust from automation that scares you. Let AI read the diffs and flag the landmines; let the parser hold the hard gate; let a human own the apply. Production credentials never go near the model, and every forward step keeps its back-out close. That’s the whole contract, and it’s cheap insurance against the eight-second mistake that takes two hours to undo.