Securing AI-Generated Bash Scripts Before You Run Them

Bash is the easiest language for AI to write and the easiest language to get devastating output from. A 20-line script that “just cleans up old files” can recursively delete a home directory because the model assumed a variable would always be set. A “simple log shipper” can write your secrets to a remote server because the model used set -x for debugging and forgot to remove it.

I have run AI-generated bash that I should not have. Most engineers I know have too. After enough close calls, there’s a short checklist that catches the worst of it. This is that checklist.

The five things to check before running any AI-generated bash

1. Does it start with a strict pragma?

The first lines of any non-trivial bash script should be:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

What each does:

set -e — exit on any command failure. Without this, a failure in line 5 doesn’t stop the script from happily running lines 6-50.
set -u — error on undefined variables. This is the one that saves you from rm -rf $UNDEFINED/.
set -o pipefail — propagate failures through pipes. Without it, failing-command | grep something succeeds because grep succeeds.
IFS=$'\n\t' — sane field splitting. Defends against word-splitting bugs in filenames.

If the AI-generated script doesn’t have these, add them and re-read the script. You’ll often discover bugs the pragma now flags.

2. Is every variable expansion quoted?

# Wrong
rm -rf $TARGET_DIR

# Right
rm -rf "$TARGET_DIR"

The wrong version is what causes the “I deleted the root directory” stories. If $TARGET_DIR is empty or contains a space, the command becomes rm -rf (delete current directory) or rm -rf foo bar (delete two unintended things).

Models default to the wrong version about half the time because the right version is harder to write in chat (“escape the quotes!”) and the wrong version is what most blogs show.

Fix: When reading AI bash, mentally check every $VAR for quotes. Add them if missing. This is the single biggest source of bash disasters.

3. What happens if a step fails partway through?

The AI will cheerfully write:

mkdir -p /opt/new-app
cd /opt/new-app
tar xzf $TARBALL
rm $TARBALL

What happens if tar xzf fails (corrupt tarball, full disk)? With set -e, the script stops. Good. Without set -e, it continues to rm $TARBALL and deletes your tarball with no backup.

For any state-changing script, ask yourself: at each step, what’s the recovery path if the step fails? If the answer is “nothing automated,” the script should at least not delete data before verifying the previous step succeeded.

The AI almost never thinks about this on its own.

4. Are secrets visible in logs?

The most common way AI-generated bash leaks secrets is via set -x:

set -x  # debugging
curl -H "Authorization: Bearer $API_TOKEN" https://api.example.com/...

With set -x, every command is printed including the expanded variables. Your API token is now in the script’s output, which is in your CI logs, which are visible to anyone with project access.

The fix is selective:

set +x  # disable trace
curl -H "Authorization: Bearer $API_TOKEN" https://api.example.com/...
set -x  # re-enable

Or simply remove set -x once debugging is done. The model frequently leaves it in.

5. Does it run as root unnecessarily?

The AI will sometimes write sudo into every command, even ones that don’t need it. Or it’ll assume the script runs as root and use absolute paths that require root to write.

The principle: if a command can run as a non-root user, it should. The smaller the privileged surface, the smaller the blast radius.

This is especially important for scripts that download and execute code. A common pattern:

# Dangerous: privileged download + execute
sudo bash -c 'curl https://example.com/install.sh | bash'

# Safer: review then run
curl https://example.com/install.sh > install.sh
# READ install.sh
sudo bash install.sh

If the model generates the first pattern, replace it with the second. Always.

A real example

Last month I asked Claude to write a script that cleans up Docker images older than 30 days on a CI runner host. The first draft was:

#!/bin/bash

DOCKER_IMAGES=$(docker images --format '{{.ID}} {{.CreatedAt}}')
CUTOFF=$(date -d '30 days ago' +%s)

echo "$DOCKER_IMAGES" | while read ID DATE; do
    CREATED=$(date -d "$DATE" +%s)
    if [ $CREATED -lt $CUTOFF ]; then
        docker rmi $ID
    fi
done

Walking the checklist:

No strict pragma. Missing set -euo pipefail.
Unquoted $DOCKER_IMAGES, $ID, $DATE. Each one is a potential bug.
Failure handling. docker rmi fails if an image is in use. The script continues, marches through, and silently fails on every in-use image. We never know which were cleaned and which weren’t.
No secrets (docker doesn’t expose them here), but the script also doesn’t log what it’s doing, so you can’t audit afterward.
No sudo, good — assumes the user has Docker socket access, which is reasonable.

The hardened version:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

CUTOFF=$(date -d '30 days ago' +%s)
REMOVED=0
SKIPPED=0

# Use --format with safer parsing
docker images --format '{{.ID}}|{{.CreatedAt}}' | while IFS='|' read -r ID DATE; do
    CREATED=$(date -d "$DATE" +%s)
    if [ "$CREATED" -lt "$CUTOFF" ]; then
        if docker rmi "$ID" 2>/dev/null; then
            echo "Removed: $ID"
            REMOVED=$((REMOVED + 1))
        else
            echo "Skipped (in use): $ID"
            SKIPPED=$((SKIPPED + 1))
        fi
    fi
done

echo "Cleanup complete. Removed: $REMOVED, Skipped: $SKIPPED."

This took two minutes of editing. Without the checklist, I might have run the original and noticed days later that disk usage hadn’t really dropped because half the images were in use.

A small note on bash linting

shellcheck catches most of these issues automatically. If you adopt one tool from this article, make it shellcheck:

shellcheck cleanup-images.sh

It will flag unquoted variables, missing strict mode, and a dozen other patterns. AI-generated bash usually has at least one shellcheck warning.

I now run shellcheck on every script before I run the script itself. It’s two seconds and catches things I’d miss.

When the AI gets it right

To be fair: the model is often perfectly capable of producing safe bash. If you prompt it explicitly — “write this with set -euo pipefail, quote every variable, fail loudly on errors” — you’ll get a clean script.

The problem is that “write me a script that does X” without that prompt gets you the common form of the script, which is the unsafe form. So the rule of thumb:

Always include the safety requirements in the prompt. Or: always treat the output as a draft that needs hardening. Don’t run any bash the AI wrote without one of those two disciplines.

The bottom line

Bash from AI is fast to produce and easy to read incorrectly. The checklist is short — strict pragma, quoted expansions, failure paths, secrets in logs, unnecessary privilege — and applying it takes a couple of minutes per script. The downside of skipping it is on the spectrum of “minor cleanup mistake” to “career incident.” There’s no excuse not to do the check.

For our prompts on bash specifically, see bash-script-code-review and the related linux-server-hardening prompt — both of which cover related territory.