sed and awk Mastery: The Two Tools That Replace 80% of Your

There are two camps in operations: people who write a 40-line Python script to extract column three from a file, and people who type awk '{print $3}' and move on. I’ve been in both, and I can tell you the second group ships faster. sed and awk have a reputation for being write-only line noise, but you only need a small, learnable subset to handle the overwhelming majority of text wrangling a DevOps job throws at you.

This is the practical core — the commands I actually use weekly, not the academic full grammar.

sed: surgical edits on a stream

sed is for transformations: substitute, delete, insert. The one command everyone knows is substitution, but the details are where people slip.

# Replace the first occurrence per line
sed 's/old/new/' file

# Replace ALL occurrences per line (the g flag)
sed 's/old/new/g' file

# Edit the file in place, keeping a backup
sed -i.bak 's/8080/9090/g' config.ini

That -i.bak is worth a habit. On GNU sed, -i alone edits in place; -i.bak writes a .bak copy first. On macOS/BSD sed, -i requires an argument, so -i.bak is also the portable spelling. Getting burned by the GNU-vs-BSD difference during a Mac-vs-Linux CI mismatch is a rite of passage; -i.bak sidesteps it.

Use a different delimiter when your pattern contains slashes — paths are the classic case:

sed 's#/var/log/old#/var/log/new#g' paths.conf

sed can also delete and print by line address, which makes it a precision tool for config files:

sed '/^#/d' file            # delete comment lines
sed '/^$/d' file            # delete blank lines
sed -n '10,20p' file        # print only lines 10-20 (-n = quiet)
sed '/BEGIN/,/END/d' file   # delete a block between markers

That last one — range deletion between two markers — is how I strip generated blocks out of config files without a parser.

awk: when you need fields and logic

The moment your data has columns, switch to awk. It splits each line into fields ($1, $2, …, with $0 being the whole line) and lets you run logic per line. This is the tool for log files, CSVs, and the output of nearly every CLI.

# Print the first and last field of each line
awk '{print $1, $NF}' file       # NF = number of fields

# Filter: only lines where field 9 (HTTP status) is 500
awk '$9 == 500' access.log

# Filter + project: client IPs that got a 500
awk '$9 == 500 {print $1}' access.log

awk shines at aggregation. Summing a column or computing an average is a one-liner that would be a real script in most languages:

# Total bytes transferred (field 10 in a common log format)
awk '{sum += $10} END {print sum}' access.log

# Average response time from a 2-column file
awk '{sum += $2; n++} END {printf "%.2f ms\n", sum/n}' timings.txt

The END block runs once after all lines — that’s where totals live. There’s a matching BEGIN block that runs first, perfect for headers or setting the field separator:

awk -F, 'BEGIN {print "host,count"} {print $1","$2}' data.csv

-F, sets the input field separator to a comma. For tab-delimited data use -F'\t'. This is how awk eats CSV and TSV without any library.

The killer feature: associative arrays in awk

awk has built-in hash maps, and they make group-by trivially easy. Counting occurrences — the thing you’d otherwise pipe through sort | uniq -c — is cleaner and faster in one pass:

# Count requests per IP, sorted by count descending
awk '{count[$1]++} END {for (ip in count) print count[ip], ip}' access.log \
  | sort -rn | head

Group-by-sum is the same shape and is the command I reach for when someone asks “how much traffic per endpoint?”:

awk '{bytes[$7] += $10} END {for (url in bytes) print bytes[url], url}' \
  access.log | sort -rn | head

One pass, no temp files, runs on multi-gigabyte logs without breaking a sweat. This single pattern has retired dozens of my old Python scripts.

sed and awk together in a pipeline

They compose. A realistic example — normalize a messy field, then aggregate it:

# Strip query strings from URLs, then count hits per path
awk '{print $7}' access.log \
  | sed 's/?.*//' \
  | sort | uniq -c | sort -rn | head -20

Here awk extracts the URL field, sed chops everything after the ?, and the classic sort | uniq -c tallies. You can do the whole thing in pure awk, but mixing tools where each is clearest is perfectly good engineering. Readability beats purity.

When to stop and write real code

sed and awk are line-oriented and stateless-ish by design. The boundary is the same one as with Bash arrays: the moment you need multi-line records, nested structure, JSON, or anything you’d describe as “parsing” rather than “filtering,” reach for Python (or jq for JSON specifically). I’ve seen heroic 200-line awk programs that should have been 20 lines of Python — clever, but unmaintainable at 3am.

My rule of thumb: if it fits on one or two lines and a teammate can read it in ten seconds, sed/awk is the right call. If you’re reaching for getline, multi-line buffers, or your awk script has functions, you’ve crossed into “write a script” territory.

Building the muscle memory

You don’t learn these from a reference card — you learn them by reaching for them. Next time you’re about to open an editor to extract a column or do a find-and-replace across a stream, stop and try the one-liner first. A great accelerator: paste your sample input into an AI assistant and ask it to write the awk command, then read it back until you understand each field reference. That’s how I went from copy-pasting incantations to writing my own.

For more one-liners and the AI prompts I use to generate and explain them, see the Bash & Python automation guides and our prompt library.

Always test text-processing commands on a copy or with -n/dry-run output before editing files in place.

sed and awk Mastery: The Two Tools That Replace 80% of Your Throwaway Scripts