Bash comm/join/sort Set Reconciliation Prompt
Reconcile two ops inventory lists with sort, comm, and join to surface drift, intersections, and one-sided membership
- Target user
- SREs and infrastructure engineers auditing inventory drift across sources of truth
- Difficulty
- Intermediate
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior SRE. Write a portable bash script that reconciles two flat lists of identifiers (hostnames, IDs, or keys) using only coreutils `sort`, `comm`, and `join` — no Python, no awk-heavy logic. Follow these steps: 1. Set strict mode (`set -euo pipefail`) and accept two input files as arguments: [LIST_A] (e.g. the cloud inventory export) and [LIST_B] (e.g. the CMDB or config-management host list). 2. Normalize each input into a temp file: strip blank lines and comments, trim whitespace, optionally lowercase, then `sort -u`. Both `comm` and `join` REQUIRE sorted input, so this step is mandatory and must use the same collation (`LC_ALL=C sort`) for both files. 3. Use `comm -23` to report identifiers present in [LIST_A] but not in [LIST_B] (orphans / drift), and `comm -13` for those present in [LIST_B] but not in [LIST_A] (missing). 4. Use `comm -12` (or `join`) to report the intersection — identifiers present in both. 5. If a second column carries metadata (e.g. `host status`), use `join -j 1` on the sorted files to correlate that metadata across both sources for the intersection set. 6. Emit a summary line with the counts for each of the three sets, and write the three detail lists to clearly labeled sections or to [OUTPUT_DIR] as `only-a.txt`, `only-b.txt`, and `both.txt`. Output format: a human-readable report to stdout with three labeled sections (Only in A / Only in B / In both) and a trailing counts summary; optionally machine-readable files when [OUTPUT_DIR] is set. Guardrail: the script is strictly read-only — it must never modify, delete, or reconcile the actual inventory; it only reports the diff. Use `mktemp` for all intermediate files and clean them up with a trap so repeated runs leave no residue and are fully idempotent.
Why this prompt works
Set reconciliation is a daily reality in operations: the cloud provider says you have 412 instances, the CMDB lists 398, and config management is managing 405. Finding exactly which identifiers fall into each bucket is a classic set-difference problem, and coreutils solves it without a single line of application code. By naming sort, comm, and join explicitly, this prompt keeps the model from reinventing the wheel with associative arrays or a Python script, and produces something that runs on any host with no runtime to install.
The most common way these commands go wrong is sorting, and the prompt front-loads it. comm and join both assume their inputs are sorted; feed them unsorted data and they will emit confidently incorrect results with no error. Forcing a normalization step with LC_ALL=C sort -u on both files — same collation, deduplicated, comments stripped — eliminates the single biggest source of false drift. Mapping the three comm flag combinations (-23, -13, -12) directly onto the operational questions (drift, missing, intersection) makes the output immediately actionable, and the optional join step lets you carry status metadata across sources for the overlap.
The read-only guardrail matters because reconciliation reports are tempting to automate end-to-end. A script that both detects and deregisters “orphan” hosts is one bad export away from deleting live infrastructure. By constraining the tool to reporting only — with mktemp temp files cleaned up by a trap so reruns are idempotent and leave nothing behind — the prompt produces something safe to schedule on a cron, while the actual remediation stays a deliberate human decision.
Related prompts
-
Bash Config File Diff and Safe Merge Prompt
Create a Bash script that compares a shipped default config against a live one, shows the drift, and merges new keys without overwriting local edits.
-
Bash Script Code Review Prompt
Get a senior-engineer review of any Bash script — safety, idempotency, error handling, portability.
-
Python Directory Tree Snapshot and Compare Prompt
Build a Python tool that records a hashed manifest of a directory tree and later reports added, removed, and modified files for change auditing.