Refactoring a Monolithic Bash Script into Functions with AI

The worst script I maintain is 500 lines of top-to-bottom bash with no functions, where variables set on line 30 are read on line 410. Every change is a guess. I used an AI assistant to refactor it into discrete functions, and the result was genuinely cleaner — but the AI also broke variable scoping in ways that only showed up at runtime. It’s a fast junior engineer doing the tedious extraction work; the behavior-preservation guarantee comes from a human reviewing every extracted function before it runs on prod.

Refactor with a safety net, not without one

The cardinal rule: never refactor untested code blind. Before touching the monolith I asked the AI to write a few end-to-end smoke tests that capture the script’s observable behavior — what files it creates, what it prints, its exit codes. These are the guardrails. If a refactor changes them, I know immediately.

@test "produces a report with the expected header" {
  run ./report.sh sample-input.csv
  [ "$status" -eq 0 ]
  [[ "${lines[0]}" == "REGION,COUNT" ]]
}

This is the same characterization-test discipline that applies to Python — see bats and pytest testing. Refactoring without it is just rewriting and hoping.

Extract one logical unit at a time

I don’t ask for “refactor this into functions” in one shot — that produces a sweeping diff I can’t review. Instead: “Extract just the report-generation block (lines 200–260) into a function with explicit parameters. Change nothing else.” Small, reviewable steps.

generate_report() {
    local input_file="$1"
    local output_file="$2"

    awk -F, 'NR>1 {count[$3]++} END {for (r in count) print r","count[r]}' \
        "$input_file" | sort > "$output_file"
}

Note local on every variable. This is where the AI’s first drafts repeatedly failed — it extracted the logic but left variables global, so two functions sharing a loop variable i silently clobbered each other. Bash defaults to global scope, and the AI forgets local constantly. Every extracted function gets a manual scope review.

Pro Tip: tell the AI “every variable inside a function must be declared local unless it’s deliberately shared.” Then grep the result for assignments without local. Unscoped variables in bash functions are the number-one source of refactoring regressions, and they never show up until a specific call order triggers them.

Pass data in, return data out

Monolithic bash communicates through globals. Clean functions take arguments and emit output. I prompt the AI to make data flow explicit, returning via stdout rather than setting a global.

get_region_count() {
    local input="$1"
    awk -F, 'NR>1 {c++} END {print c+0}' "$input"
}

count="$(get_region_count "$file")"

Capturing via $(...) instead of reading a global makes each function independently testable. The AI understands this pattern when asked, but its default is to keep mutating a global because that’s what the original did — I have to push it toward stdout returns and verify it didn’t leave a hidden global behind.

Preserve exit-code propagation

Functions must propagate failure, or set -e won’t catch it. The AI sometimes adds a trailing echo or return 0 that masks a failed command inside the function.

upload_artifact() {
    local file="$1"
    aws s3 cp "$file" "s3://bucket/path/" || return 1
    return 0
}

Here the explicit || return 1 ensures a failed upload propagates. I review every function’s last statement, because a stray successful command at the end (like a logging echo) silently swallows the real exit code. This connects to the hardening patterns in hardening a bash script with AI.

Keep behavior identical — diff the outputs

After each extraction I run the smoke tests and, more thoroughly, diff the full output of the refactored script against the original on real fixture inputs.

./report.sh fixture.csv > /tmp/new.out
git show HEAD:report.sh | bash /dev/stdin fixture.csv > /tmp/old.out
diff /tmp/old.out /tmp/new.out && echo "behavior preserved"

Two regressions in my refactor only surfaced here, not in code review — a sort order change and a dropped trailing newline. The AI’s extraction looked correct line by line; the behavior differed. Diffing output is the only real proof of a behavior-preserving refactor.

Know when bash is the wrong language

Partway through, the AI flagged that the data-aggregation logic would be far cleaner in Python. It was right. If your refactor keeps fighting bash’s lack of real data structures, the better move may be a port — covered in translating a bash script to Python. The AI is useful precisely because it’ll tell you when the language is the problem, not the structure.

Keep secrets out, run the review gate

The original script had an embedded API key. Before sharing it for refactoring I replaced it with __API_KEY__. The AI restructured the credential handling — pulling it from the environment — without ever seeing the real value. The rule holds: AI writes the plumbing, you supply the secret at runtime.

Before merging the refactor, I run it through the code review dashboard, which statically flags unscoped variables and swallowed exit codes — exactly the regressions the AI tends to introduce.

Conclusion

AI does the tedious work of carving a 500-line monolith into functions at junior-engineer speed, but it forgets local, masks exit codes, and leaves hidden globals behind. The clean, behavior-preserving result comes from refactoring in small steps, declaring every variable local, returning via stdout, propagating exit codes, and diffing output against the original after every change. Use the speed for the grunt work; keep the judgment for the scoping and the behavior guarantee. More refactoring prompts live in the bash and Python automation category, the prompts library, and the prompt packs.