Dry-Running Destructive Scripts with AI Before They Touch

The worst thirty seconds of my career were watching a cleanup script I wrote delete the wrong S3 prefix, in production, with no preview and no undo. It did exactly what I told it to. The bug wasn’t the deletion logic — that worked perfectly. The bug was that the script acted before I ever got to see what it was about to do. Every destructive automation deserves a dry-run mode that shows its work first, and retrofitting one onto an existing script is a tidy, well-shaped task that an AI assistant handles fast. The judgment about what counts as destructive stays with you.

The assistant is the quick junior who adds the --dry-run plumbing. You’re the senior who refuses to run anything against prod that can’t first tell you what it’ll do.

Dry-run is a first-class feature, not an afterthought

A real dry-run mode doesn’t just skip the dangerous calls — it prints exactly what would happen, in enough detail that you can sign off. The pattern is a single flag threaded through every side-effecting operation.

import argparse

def delete_object(bucket, key, *, dry_run):
    if dry_run:
        print(f"[DRY-RUN] would delete s3://{bucket}/{key}")
        return
    s3.delete_object(Bucket=bucket, Key=key)

parser = argparse.ArgumentParser()
parser.add_argument("--dry-run", action="store_true",
                    help="Show actions without performing them")
args = parser.parse_args()

When you ask an AI to “add a dry-run mode to this script,” the review check is simple but vital: does the flag gate every mutating call, or did one slip through? A dry-run that previews four deletes and silently performs a fifth is worse than no dry-run at all, because it builds false confidence. Trace every side effect.

Pro Tip: Make --dry-run the default and require an explicit --execute (or --no-dry-run) to actually act. Inverting the default means a forgotten flag results in a harmless preview, not an accidental deletion. The one time someone runs the script half-asleep, the worst case is “it printed what it would do.” That inversion has saved me more than once.

Show a diff, not just a list

For anything that modifies rather than deletes — config edits, in-place updates — the gold standard is a real diff of before-and-after. Ask the AI to render one.

import difflib

def preview_change(path, new_content, *, dry_run):
    old = open(path).read()
    if dry_run:
        diff = difflib.unified_diff(
            old.splitlines(keepends=True),
            new_content.splitlines(keepends=True),
            fromfile=f"{path} (current)",
            tofile=f"{path} (proposed)",
        )
        print("".join(diff))
        return
    open(path, "w").write(new_content)

A unified diff turns “trust me, it’s a small change” into something you can actually read line by line. This is the same instinct that makes terraform plan indispensable — see the change, then approve the change. When a script edits files, a diff preview should be non-negotiable, and the AI is good at wiring difflib in once you ask.

Bash gets a dry-run too

The shell version is just as important, since so much destructive automation is Bash. The clean pattern is a run wrapper that either echoes or executes.

#!/usr/bin/env bash
set -euo pipefail
DRY_RUN="${DRY_RUN:-true}"   # safe default

run() {
  if [[ "$DRY_RUN" == "true" ]]; then
    echo "[DRY-RUN] $*"
  else
    "$@"
  fi
}

run rm -rf "/var/cache/app/$old_version"
run systemctl restart app

Every dangerous command goes through run, and by default the script narrates instead of acting. Set DRY_RUN=false to arm it. When an AI refactors a Bash script for safety, confirm that every destructive line is wrapped — a bare rm outside the wrapper defeats the whole mechanism. Reading for the un-wrapped command is the review.

Confirmation gates for the truly dangerous

Below dry-run sits a second layer for irreversible actions: an interactive confirmation that forces a human to acknowledge the blast radius.

def confirm(action: str) -> bool:
    print(f"About to: {action}")
    print(f"Target context: {current_context()}")  # show WHERE
    return input("Type 'yes' to proceed: ").strip() == "yes"

The detail that matters is echoing the target context — which cluster, which account, which environment — right before the prompt. A confirmation that doesn’t tell you where you’re pointed is theater; the whole point is to catch the “wait, I’m on prod?” moment. Ask the AI to include the context echo, because it won’t add it unprompted, and that line is the one that prevents the worst mistakes.

Have AI red-team its own script

Beyond adding dry-run, the AI is genuinely useful as an adversarial reviewer. Once a destructive script exists, paste it back and ask the hard question:

Walk through this script and tell me, line by line, what the worst-case outcome is if it runs against production with bad input. Where can it cause irreversible damage?

This reframes the model from “make it work” to “find how it breaks,” and it surfaces the unhandled empty-variable that expands rm -rf /$DIR into rm -rf /, the missing -- before a filename argument, the glob that matches more than intended. It’s the same instinct as routing the change through a structured code review before it merges — get a second read focused on damage, not just function. For the moments when a script has gone wrong, the patterns in our incident response workflow are where you head next.

The reviewer never delegates judgment

Here’s the line that doesn’t move: the AI can add --dry-run, render diffs, and red-team the logic, but it cannot know which of your operations is genuinely irreversible in your environment. Deleting from a bucket with versioning enabled is recoverable; deleting from one without it is not — and only you know which yours is. So the human reviews every destructive script before prod, every time, and treats AI as the fast junior who builds the safety rails rather than the authority on when they’re enough. And the perennial rule holds here too: never hand the model your real credentials or production identifiers to “make the dry-run realistic” — sanitized values produce identical plumbing.

Where this fits

Dry-run mode is the safety layer underneath everything in the Bash and Python automation category that mutates real systems. I draft the plumbing with Claude or Cursor, keep the wrapper patterns in a prompt workspace, and run dangerous scripts past the red-team prompt every time. Those prompts live in our prompt library and the prompt packs.

The rule

A destructive script that acts before it previews is one tired engineer away from an outage. So let the AI add the dry-run flag, the diff preview, and the confirmation gate fast — it’s good at all three — but you own what they’re for: every mutating call gated, the default safe, the target context shown, and the call on what’s truly irreversible. Quick junior builds the rails; the human decides whether they’re enough before it touches prod.

Dry-Running Destructive Scripts with AI Before They Touch Prod