jq for JSON: Stop Grepping API Responses Like It's 2009

I once watched a colleague try to pull a single value out of an AWS CLI response with grep and cut, and it worked — until Amazon reordered the JSON fields and the whole pipeline produced garbage. JSON is structured, and structure-blind tools like grep, cut, and awk will eventually betray you on it. The right tool is jq, and like awk, you only need a small slice of it to cover almost everything you do.

Here’s the practical core for someone who lives in kubectl, aws, gh, and curl output.

The mental model: filters that flow

jq is a pipeline of filters, much like a shell pipeline. The simplest filter is ., the identity — it pretty-prints and color-highlights whatever you feed it:

curl -s https://api.example.com/status | jq .

That alone is reason enough to install it. From there you drill in with dots and brackets:

echo '{"app": {"version": "2.4.1"}}' | jq '.app.version'   # "2.4.1"
echo '{"app": {"version": "2.4.1"}}' | jq -r '.app.version' # 2.4.1

The -r flag is the one you’ll use most: raw output, stripping the JSON quotes so the value is usable in a shell variable. Without -r, you get "2.4.1" with literal quotes, which then breaks the next command. Burn -r into your fingers.

version=$(curl -s "$API/version" | jq -r '.version')
echo "Deploying $version"

Arrays: the part that trips people up

Most real API responses are arrays of objects — a list of pods, instances, pull requests. .[] iterates the array, emitting one element per line of output:

# Names of all items
kubectl get pods -o json | jq -r '.items[].metadata.name'

.items[] unwraps the array, and .metadata.name projects a field from each element. Chain filters to project multiple fields, and use string interpolation to format them:

kubectl get pods -o json \
  | jq -r '.items[] | "\(.metadata.name) \(.status.phase)"'

The \(...) syntax interpolates a value into a string — your formatting tool inside jq.

Filtering with select

The single most useful jq verb for ops is select(...), which keeps only elements matching a condition. This is your WHERE clause:

# Only pods that aren't Running
kubectl get pods -o json | jq -r '
  .items[]
  | select(.status.phase != "Running")
  | .metadata.name'

Combine select with comparisons, and/or, and string tests to build precise queries entirely in jq, no grep needed:

# EC2 instances that are running AND tagged env=prod
aws ec2 describe-instances | jq -r '
  .Reservations[].Instances[]
  | select(.State.Name == "running")
  | select(.Tags[]? | select(.Key=="env" and .Value=="prod"))
  | .InstanceId'

The ? in .Tags[]? suppresses errors when a key is missing — essential when not every object has every field, which in real API data is always.

Reshaping output for humans and machines

jq can build new JSON, which is how you trim a giant response down to what you care about:

gh pr list --json number,title,author --jq '
  .[] | {pr: .number, title: .title, who: .author.login}'

For human-readable tables, @tsv plus column is a clean combo:

kubectl get pods -o json | jq -r '
  .items[] | [.metadata.name, .status.phase, .spec.nodeName] | @tsv' \
  | column -t

[...]| @tsv formats an array as tab-separated, and column -t aligns it. This turns raw JSON into something you’d actually want to read in a terminal.

Aggregation without leaving jq

Like awk, jq aggregates. length, group_by, map, and add cover most needs:

# Count pods by phase
kubectl get pods -o json | jq -r '
  [.items[].status.phase]
  | group_by(.)
  | map({phase: .[0], count: length})
  | .[] | "\(.phase): \(.count)"'

group_by needs the input sorted by the grouping key, which it handles internally here because we built a flat array first. This pattern — flatten, group, map to a count — is the jq equivalent of the awk associative-array tally.

Safe variable handling: jq’s —arg

Never interpolate shell variables into a jq program with string concatenation — it’s the JSON equivalent of SQL injection and it breaks on quotes. Pass values in with --arg:

target="prod"
jq --arg env "$target" '
  .items[] | select(.metadata.labels.env == $env)' data.json

--arg name value binds a shell value to a jq variable as a string; use --argjson when the value is itself JSON (a number, bool, or object). This keeps your data and your program cleanly separated.

When jq isn’t the answer

jq is superb for filtering, projecting, and reshaping JSON in a pipeline. Where it gets painful is heavy procedural logic, joining multiple sources, or anything stateful across many files — at that point the program becomes its own little language you have to maintain. That’s the handoff point to Python and its json module, where you get real variables, functions, and a debugger. I draw the same line I draw with sed and awk: one expressive filter, use jq; a program, use Python.

A practical workflow I love: ask an AI assistant to draft the jq filter from a sample of your JSON, then run it and tweak. jq’s syntax is fiddly enough that generating a first draft and refining it beats writing it cold — just verify the output against the real data, because a wrong select fails silently by returning nothing.

For more JSON and API patterns, plus the prompts I use to generate jq filters, see the Bash & Python automation category and our prompt library.

Verify jq filters against real API responses — an incorrect path or select often produces empty output rather than an error.