Parsing YAML in Bash and Python: yq and PyYAML Without the

YAML is the connective tissue of modern ops — Kubernetes manifests, CI configs, Ansible playbooks, Helm values, your service’s own settings. And yet the instinct, when a script needs to pull one value out of a YAML file, is to reach for grep and awk. That works right up until someone indents differently, adds an anchor, or quotes a string, and then your “parser” returns the wrong line with total confidence. YAML is structured, and structure-blind tools betray you on it eventually. The right tools are yq in the shell and PyYAML/ruamel.yaml in Python, and an AI assistant is great at drafting the queries — which are fiddly enough that nobody writes them cold.

The assistant is the fast junior here. It writes the path expression; you verify it returns what you think it does against the real file.

yq: jq’s syntax for YAML

The yq you want is Mike Farah’s Go implementation, which speaks jq-like expressions over YAML. If you already know a little jq, you know most of yq.

# Pull the image tag out of a deployment
yq '.spec.template.spec.containers[0].image' deploy.yaml

The -r raw-output habit from jq carries over — though yq returns raw scalars by default, so quoting is less of a trap. Where it shines for ops is editing in place:

# Bump the replica count, in place, structure-aware
yq -i '.spec.replicas = 5' deploy.yaml

Unlike a sed substitution, this understands the document, so it won’t accidentally match a replicas: nested somewhere else. That structural awareness is the whole reason to use it. But -i still rewrites the file, so the same discipline applies as with any in-place edit: run it to stdout first, confirm the diff, then commit.

Let AI draft the path expressions

The genuinely tedious part is figuring out the path into a deeply nested manifest. This is a perfect AI task — paste a snippet, describe the value you want.

Here’s a Kubernetes Deployment. Write a yq expression that lists the name and image of every container, as name image per line.

yq '.spec.template.spec.containers[] | .name + " " + .image' deploy.yaml

Ask for the explanation so you learn the path, and so you can spot if the AI guessed a field name that doesn’t exist in your actual schema. A yq query against a missing path returns null rather than erroring — silent, like a bad jq select — so verifying against the real file is non-negotiable.

Pro Tip: yq can convert between formats, which is a hidden superpower. yq -o json '.' config.yaml turns YAML into JSON so you can pipe it into jq and the rest of your JSON tooling. And yq -p json -o yaml goes the other way. When a pipeline mixes both formats, this one flag saves a Python detour.

PyYAML in Python: always safe_load

When the logic outgrows a one-liner — you’re joining files, validating a schema, or transforming structure — move to Python. The single rule that matters: use safe_load, never load.

import yaml

with open("deploy.yaml") as fh:
    doc = yaml.safe_load(fh)

containers = doc["spec"]["template"]["spec"]["containers"]
for c in containers:
    print(c["name"], c["image"])

Plain yaml.load() can instantiate arbitrary Python objects from crafted YAML tags — a genuine remote-code-execution vector if the file is ever attacker-influenced. safe_load refuses those tags. This is the one place I won’t accept AI-generated YAML code that uses load(); if a draft does, reject it on sight. Modern PyYAML warns about it, but the habit is what protects you.

When you need to write YAML back: ruamel

PyYAML can dump YAML, but it discards comments and reorders keys — fine for machine-to-machine, hostile to a human-maintained config file. If you’re editing a file people read, use ruamel.yaml, which round-trips comments and order.

from ruamel.yaml import YAML

yaml = YAML()  # round-trip mode by default
with open("values.yaml") as fh:
    data = yaml.load(fh)

data["replicas"] = 5
with open("values.yaml", "w") as fh:
    yaml.dump(data, fh)

The comments and formatting survive. When you ask an AI to “edit this Helm values file in Python,” specify that comments must be preserved, or it’ll default to PyYAML and quietly strip every explanatory comment your team relied on. That’s a review check: did the round-trip keep the comments?

The multi-document gotcha

Kubernetes manifests routinely pack several documents into one file separated by ---. Code that uses safe_load instead of safe_load_all reads only the first document and silently ignores the rest.

import yaml

with open("manifests.yaml") as fh:
    for doc in yaml.safe_load_all(fh):
        if doc and doc.get("kind") == "Service":
            print(doc["metadata"]["name"])

When an AI-generated script processes a manifest file and your changes only seem to apply to the first resource, this is almost always why. Check load versus load_all whenever a file might hold multiple documents — it’s a top-three YAML bug in generated code.

Keep secrets out of the file and the prompt

YAML config files are a magnet for hardcoded secrets — a database password sitting in values.yaml. The right pattern is a placeholder in the YAML and the real value injected from the environment or a secrets manager at runtime. When you paste a config into an AI prompt to get help with a yq query, scrub any real credentials first. The model needs the structure, not the password. Hand it password: REDACTED and it’ll write the same query.

Tying it together

YAML parsing sits underneath a lot of the automation in the Bash and Python automation category — the manifests your deploy scripts read, the configs your services load. I draft the yq and PyYAML with Claude or Gemma for the lighter queries, verify against real files, and stash the proven expressions in a prompt workspace. The reusable prompts are in our prompt library, with bundles in the prompt packs.

The rule

YAML’s whitespace-significant, multi-document, anchor-laden nature makes it deceptively easy to parse wrong and get a plausible answer. So let the AI draft the yq paths and PyYAML loaders fast, but you enforce the four checks: safe_load not load, load_all for multi-doc files, ruamel when comments must survive, and no real secret in the file or the prompt. Quick junior writes it; you confirm it’s safe before it touches infra.

Parsing YAML in Bash and Python: yq and PyYAML Without the Footguns