Writing Safe sed and awk Bulk Edits With AI Review

A few years ago I ran a “quick” find-and-replace across a config repo. The intent was harmless: swap an old internal hostname for a new one. I piped find into sed -i from memory, hit enter, and felt that little glow of efficiency. Then the build failed. Then the next build failed. My pattern had matched a substring inside a longer hostname, and sed had cheerfully rewritten 140 files, including some YAML where the new value broke indentation-sensitive parsing. There was no backup. There was no diff. There was just a wall of red in CI and a sinking feeling.

I got lucky that time because the repo was under version control and I could git checkout my way out. But the lesson stuck: sed and awk are power tools, and power tools do exactly what you tell them, instantly, to every file you point them at. Today I lean on AI to help me write these one-liners faster than ever. That speed makes the discipline around them more important, not less.

Treat the AI Like a Fast Junior Engineer

The single most useful mental model I have for working with AI on shell edits: it is a fast, tireless junior engineer who has read every man page but has never seen your repo. It will produce a syntactically perfect sed command in two seconds. It will not know that three of your files use CRLF line endings, that one “config” file is actually a checked-in binary, or that the string you want to replace also appears inside a comment you care about.

So I use the model the way I would delegate to a junior: I let it draft, I review, and I own the decision to run. I never hand it the keys to production. The model writes the command; a human reads the diff and pulls the trigger. If you want a deeper take on why automated actions need approval gates, I wrote about that in blast radius scoping for AI-driven automation.

The prompt I actually use looks like this:

Write a sed command to replace "api.old-host.internal" with
"api.new-host.internal" in *.yaml files under ./config.

Then explain each part of the command, what the regex matches,
and what could go wrong. Do NOT run it. I will review first.

That last sentence matters. I want the explanation as much as the command, because the explanation is where I catch the blast-radius problems before they happen.

Preview First: sed Without -i

The most important habit is to run sed without -i before you ever run it with -i. By default, sed writes to stdout and leaves the file untouched. That makes it a preview tool:

sed 's/api\.old-host\.internal/api.new-host.internal/g' config/app.yaml

You read the output. You confirm it changed what you expected and nothing else. Only then do you reach for in-place editing. The number of disasters I have avoided simply by looking at stdout once is embarrassingly high.

To preview only the lines that would change, ask sed to print just the matches:

sed -n 's/api\.old-host\.internal/api.new-host.internal/gp' config/app.yaml

The -n suppresses normal output and the p flag prints only modified lines. If that command prints fifty lines and you expected three, stop. Your pattern is too broad.

Pro Tip: Ask the AI to predict the match count. “Roughly how many lines should this match in a typical service config?” If the model says “a handful” and your preview shows hundreds, the regex is the problem, not the file.

Anchor the Regex So It Means What You Think

My original disaster happened because oldhost was a substring of oldhost-replica. The fix is anchoring. Tell the AI to escape literal dots and to bound the match so it cannot bleed into neighboring text:

# Bad: matches "old-host" anywhere, including inside longer tokens
sed 's/old-host/new-host/g' file

# Better: dots escaped, word boundaries respected
sed -E 's/\bapi\.old-host\.internal\b/api.new-host.internal/g' file

The -E flag turns on extended regex so \b word boundaries and groups behave predictably. When I ask the model for a one-liner, I explicitly request: “anchor this so it only matches the full token, and escape every dot.” Then I make it walk me through the regex character by character. If it cannot explain why a \b is there, I do not trust the command.

This is exactly the kind of careful, narrated reasoning that good prompting unlocks. If you build these review prompts often, a reusable library helps; I keep mine in the prompt workspace and pull from curated prompt packs so I am not rewriting “explain this regex” from scratch every time.

In-Place Edits With a Backup You Can Trust

When you are confident, edit in place but keep a back-out path. The -i flag accepts a suffix, and that suffix gives you a free backup of every file it touches:

sed -i.bak -E 's/\bapi\.old-host\.internal\b/api.new-host.internal/g' config/app.yaml

Now config/app.yaml.bak holds the original. If something is wrong, restoring is a one-liner:

# Undo every edit in this run
for f in config/*.yaml.bak; do mv "$f" "${f%.bak}"; done

Note that GNU sed and BSD/macOS sed differ here: BSD sed -i '' requires an explicit empty argument for no suffix. Tell the AI which platform you are on so it does not hand you a command that silently creates files named -e on the wrong OS. This kind of environment mismatch is precisely what a junior engineer gets wrong, so I always state the target shell and OS in the prompt.

awk for Field-Level Edits

sed is line-oriented text surgery. When your data has structure, columns, CSV, key-value pairs, awk is the safer tool because it edits fields instead of guessing with a regex. Say you have a CSV and you only want to change the third column when it equals a specific value:

awk -F',' 'BEGIN{OFS=","} $3=="old-region"{$3="new-region"} 1' regions.csv

Reading it left to right: -F',' sets the input separator, OFS=',' sets the output separator so the row gets rebuilt with commas, the condition only touches rows where field three is exactly old-region, and the trailing 1 is awk shorthand for “print this line.” Because the match is scoped to a field and an exact value, it cannot accidentally rewrite a substring buried in column five.

This is the prompt pattern I rely on: “Use awk, not sed, and only modify the field that matches exactly. Print every row unchanged otherwise.” Asking for field-scoped logic is itself a form of blast-radius control. For more on automating these chores responsibly, the automation category collects the rest of this series.

Let Git Be the Safety Net

Backups are good; version control is better. Before any bulk edit, I make sure the working tree is clean so the only changes after the command are the ones the command made:

git status --porcelain   # should be empty before you start

After running the edit, the diff is your review surface:

git diff --stat          # how many files, how many lines? sanity check
git diff config/app.yaml # read the actual changes

If --stat says 12 files when you scoped the glob to one directory, something matched wider than intended. Reading the diff before you stage anything is the human approval gate. Nothing gets committed until I have looked at it with my own eyes. If it is wrong, git checkout -- . throws the whole thing away and you start over with a tighter pattern. I go deeper on rehearsing changes before they touch anything real in dry run and simulation before automated actions.

Pro Tip: Run the AI-generated edit on a throwaway branch. git switch -c bulk-edit-test, run the command, review the diff, and only merge if it is clean. The branch is your sandbox and your undo button at the same time.

A Loop Over Files With a Dry-Run Flag

For anything touching more than a couple of files, I wrap the edit in a small script with a real dry-run mode, because the scariest part of bulk editing is not the regex, it is the file list. Scope the glob tightly and prove the list is right before you mutate anything:

#!/usr/bin/env bash
set -euo pipefail

DRY_RUN="${DRY_RUN:-1}"   # default to safe; opt in to live edits
PATTERN='s/\bapi\.old-host\.internal\b/api.new-host.internal/g'

# Scope the glob TIGHTLY. Be specific about the directory and extension.
mapfile -t files < <(git ls-files 'config/*.yaml')

echo "Targeting ${#files[@]} files:"
printf '  %s\n' "${files[@]}"

for f in "${files[@]}"; do
  if [[ "$DRY_RUN" == "1" ]]; then
    echo "--- would edit: $f"
    sed -E "$PATTERN" "$f" | diff --color=auto "$f" - || true
  else
    sed -i.bak -E "$PATTERN" "$f"
  fi
done

Run it once as-is and it shows the diff for every file without changing a thing. Only DRY_RUN=0 ./edit.sh performs the real edits. Two details earn their keep: using git ls-files instead of find means I only ever touch tracked files (no node_modules, no build artifacts, no .git internals), and defaulting DRY_RUN to safe means a forgetful future me does not nuke the repo by running the script bare.

When I ask the model to generate this scaffolding, I am explicit: “Default to dry-run. Print the file list before editing. Use git ls-files so we never wander outside the tracked tree.” A good one-liner is fast; a good loop with a dry-run flag is what lets you sleep at night. If you want help getting an assistant to reliably produce this shape of script, the prompts library has the templates I start from.

Conclusion

AI has genuinely changed how quickly I can write sed and awk commands, and it is great at explaining a gnarly regex when I ask it to narrate each piece. But none of that changes the rules of the road. Preview before you edit. Anchor your patterns. Keep a backup and a clean git tree. Scope the glob as narrowly as the task allows. Read the diff before you commit. The model is a fast junior engineer that drafts the command; the human is the one who reads the blast radius, approves the run, and keeps a back-out path within reach. Stay in that seat and bulk editing stops being scary and starts being boring, which, for production work, is exactly what you want.