Parsing YAML in Bash and Python: yq and PyYAML Without the Footguns
YAML runs your infra but bash can't parse it safely. Use yq in scripts and PyYAML in Python, with AI to draft the queries — and dodge the classic gotchas.
- #bash
- #python
- #yaml
- #yq
- #automation
YAML is the connective tissue of modern ops — Kubernetes manifests, CI configs, Ansible playbooks, Helm values, your service’s own settings. And yet the instinct, when a script needs to pull one value out of a YAML file, is to reach for grep and awk. That works right up until someone indents differently, adds an anchor, or quotes a string, and then your “parser” returns the wrong line with total confidence. YAML is structured, and structure-blind tools betray you on it eventually. The right tools are yq in the shell and PyYAML/ruamel.yaml in Python, and an AI assistant is great at drafting the queries — which are fiddly enough that nobody writes them cold.
The assistant is the fast junior here. It writes the path expression; you verify it returns what you think it does against the real file.
yq: jq’s syntax for YAML
The yq you want is Mike Farah’s Go implementation, which speaks jq-like expressions over YAML. If you already know a little jq, you know most of yq.
# Pull the image tag out of a deployment
yq '.spec.template.spec.containers[0].image' deploy.yaml
The -r raw-output habit from jq carries over — though yq returns raw scalars by default, so quoting is less of a trap. Where it shines for ops is editing in place:
# Bump the replica count, in place, structure-aware
yq -i '.spec.replicas = 5' deploy.yaml
Unlike a sed substitution, this understands the document, so it won’t accidentally match a replicas: nested somewhere else. That structural awareness is the whole reason to use it. But -i still rewrites the file, so the same discipline applies as with any in-place edit: run it to stdout first, confirm the diff, then commit.
Let AI draft the path expressions
The genuinely tedious part is figuring out the path into a deeply nested manifest. This is a perfect AI task — paste a snippet, describe the value you want.
Here’s a Kubernetes Deployment. Write a
yqexpression that lists the name and image of every container, asname imageper line.
yq '.spec.template.spec.containers[] | .name + " " + .image' deploy.yaml
Ask for the explanation so you learn the path, and so you can spot if the AI guessed a field name that doesn’t exist in your actual schema. A yq query against a missing path returns null rather than erroring — silent, like a bad jq select — so verifying against the real file is non-negotiable.
Pro Tip: yq can convert between formats, which is a hidden superpower. yq -o json '.' config.yaml turns YAML into JSON so you can pipe it into jq and the rest of your JSON tooling. And yq -p json -o yaml goes the other way. When a pipeline mixes both formats, this one flag saves a Python detour.
PyYAML in Python: always safe_load
When the logic outgrows a one-liner — you’re joining files, validating a schema, or transforming structure — move to Python. The single rule that matters: use safe_load, never load.
import yaml
with open("deploy.yaml") as fh:
doc = yaml.safe_load(fh)
containers = doc["spec"]["template"]["spec"]["containers"]
for c in containers:
print(c["name"], c["image"])
Plain yaml.load() can instantiate arbitrary Python objects from crafted YAML tags — a genuine remote-code-execution vector if the file is ever attacker-influenced. safe_load refuses those tags. This is the one place I won’t accept AI-generated YAML code that uses load(); if a draft does, reject it on sight. Modern PyYAML warns about it, but the habit is what protects you.
When you need to write YAML back: ruamel
PyYAML can dump YAML, but it discards comments and reorders keys — fine for machine-to-machine, hostile to a human-maintained config file. If you’re editing a file people read, use ruamel.yaml, which round-trips comments and order.
from ruamel.yaml import YAML
yaml = YAML() # round-trip mode by default
with open("values.yaml") as fh:
data = yaml.load(fh)
data["replicas"] = 5
with open("values.yaml", "w") as fh:
yaml.dump(data, fh)
The comments and formatting survive. When you ask an AI to “edit this Helm values file in Python,” specify that comments must be preserved, or it’ll default to PyYAML and quietly strip every explanatory comment your team relied on. That’s a review check: did the round-trip keep the comments?
The multi-document gotcha
Kubernetes manifests routinely pack several documents into one file separated by ---. Code that uses safe_load instead of safe_load_all reads only the first document and silently ignores the rest.
import yaml
with open("manifests.yaml") as fh:
for doc in yaml.safe_load_all(fh):
if doc and doc.get("kind") == "Service":
print(doc["metadata"]["name"])
When an AI-generated script processes a manifest file and your changes only seem to apply to the first resource, this is almost always why. Check load versus load_all whenever a file might hold multiple documents — it’s a top-three YAML bug in generated code.
Keep secrets out of the file and the prompt
YAML config files are a magnet for hardcoded secrets — a database password sitting in values.yaml. The right pattern is a placeholder in the YAML and the real value injected from the environment or a secrets manager at runtime. When you paste a config into an AI prompt to get help with a yq query, scrub any real credentials first. The model needs the structure, not the password. Hand it password: REDACTED and it’ll write the same query.
Tying it together
YAML parsing sits underneath a lot of the automation in the Bash and Python automation category — the manifests your deploy scripts read, the configs your services load. I draft the yq and PyYAML with Claude or Gemma for the lighter queries, verify against real files, and stash the proven expressions in a prompt workspace. The reusable prompts are in our prompt library, with bundles in the prompt packs.
The rule
YAML’s whitespace-significant, multi-document, anchor-laden nature makes it deceptively easy to parse wrong and get a plausible answer. So let the AI draft the yq paths and PyYAML loaders fast, but you enforce the four checks: safe_load not load, load_all for multi-doc files, ruamel when comments must survive, and no real secret in the file or the prompt. Quick junior writes it; you confirm it’s safe before it touches infra.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.