Catching Risky Shell Commands Before They Run with AI

I have personally taken down production with a single command. It was a find with a -delete flag and a path variable that expanded to an empty string. The command ran exactly as written. The problem was that what I wrote and what I meant were two different things, and nothing stood between the two.

After 25 years on the keyboard I’ve stopped trusting my own pattern-matching at 2am. These days I run risky commands past an AI reviewer first — not to write them for me, but to act as a second pair of eyes that has read every man page and never gets tired. This is defensive work: catching mistakes before they execute, not finding clever ways to break things. Here’s how I do it.

The class of commands worth reviewing

You don’t need to review ls. The commands worth a pre-flight check share a few traits: they’re destructive, irreversible, run with elevated privileges, or operate on a wide blast radius. A rough mental filter:

Anything with rm -rf, dd, mkfs, > file, or --delete
Anything piping curl ... | bash from a URL
Anything with chmod -R, chown -R, or find ... -exec
Anything touching iptables -F, systemctl, partition tables, or LVM
Anything where a shell variable expands into a path

The common thread is that a small typo produces a catastrophically different result. Those are the commands I paste into a model before running.

A pre-flight review prompt that actually works

Vague prompts get vague answers. I give the model a tight, defensive framing:

You are a senior SRE reviewing a shell command before I run it on a
production Linux host. Do NOT rewrite it for me. Instead:
1. State in one sentence what this command actually does.
2. List every irreversible or destructive side effect.
3. Flag any variable that could expand unexpectedly or any glob
   that could match more than intended.
4. Tell me the single safest way to dry-run or scope-test it first.

Command:
find "$BACKUP_DIR"/ -type f -mtime +30 -delete

The “state what it actually does” step is the valuable one. When the model’s plain-English summary doesn’t match my intent, that mismatch is the bug. In the example above, a good reviewer immediately notes that if $BACKUP_DIR is unset, the command becomes find // -type f -mtime +30 -delete and starts walking the entire filesystem.

Pro Tip: Always ask the model to identify what happens when each variable is empty or unset. Empty-variable expansion is the single most common cause of “the script deleted everything” incidents, and it’s the easiest thing to miss reading your own code.

Make the dry-run the default, not the afterthought

The best outcome of a review is a safer way to test. For destructive find, that means swapping -delete for -print first:

# Review first — just list what WOULD be deleted
find "${BACKUP_DIR:?BACKUP_DIR is required}"/ -type f -mtime +30 -print

# Only after eyeballing the list do you swap -print for -delete

Notice the ${BACKUP_DIR:?...} syntax — that’s a guard that makes the shell abort with an error if the variable is unset, rather than expanding to nothing. A good AI review will suggest exactly this kind of defensive guard. The point isn’t that the model knows secrets you don’t; it’s that it reliably remembers the safety idioms you forget under pressure.

Build a wrapper that nudges you to review

Discipline fails. Tooling doesn’t. I keep a tiny shell function that intercepts commands matching dangerous patterns and forces a pause:

# ~/.bashrc — a deliberately annoying speed bump
confirm_risky() {
  local cmd="$*"
  if echo "$cmd" | grep -Eq 'rm -rf|mkfs|dd if=|--delete| > /'; then
    echo "⚠️  This looks destructive. Review it before running."
    echo "    $cmd"
    read -rp "Type 'yes' to proceed: " ans
    [ "$ans" = "yes" ] || return 1
  fi
  eval "$cmd"
}

It’s crude, but a half-second of friction on the right commands has saved me more than once. You can pair this with terminal tooling that has AI built in — Warp’s Agent Mode can explain a command inline before you run it, which removes the copy-paste step entirely.

Reviewing scripts, not just one-liners

The same approach scales to scripts. Before running a deploy or cleanup script someone handed me, I ask the model to do a focused safety pass:

Review this bash script for DESTRUCTIVE or IRREVERSIBLE operations only.
For each one, tell me: the line number, what it destroys, and whether
there's a guard preventing it from running with bad inputs. Ignore style.

Constraining the review to “destructive operations only” keeps the model from drowning you in style nitpicks and keeps it focused on what can actually hurt you. I run this kind of automated safety pass continuously now — our code review dashboard wires the same prompt into a pipeline so risky shell diffs get flagged before they merge.

Keep the human in the loop, and keep secrets out

Two rules I never break. First, the AI is a fast junior engineer doing review — it catches obvious mistakes brilliantly, but it can hallucinate man-page flags and miss subtle logic. I verify every flagged item against the actual documentation before acting, and I never let the model’s output run unreviewed. Second, I never paste real secrets into a prompt. If a command contains an API token, a database password, or a private key, I redact it to a placeholder like $DB_PASSWORD before the command ever leaves my terminal. The model doesn’t need the value to review the structure.

If you want a starting library of these review prompts, I keep a tuned set in the prompt library and a deeper hardening-focused bundle in the prompt packs.

Conclusion

The command line gives you enormous power and zero undo. An AI pre-flight reviewer doesn’t make you slower — it makes you deliberate at exactly the moments that matter, and it remembers the safety idioms you forget when you’re tired. Use it as a second set of eyes, verify what it tells you, keep your secrets out of the prompt, and you’ll catch the empty-variable disaster while it’s still just a draft. For more in this vein, browse the rest of security and hardening.