Building a Searchable Postmortem Knowledge Base and Trend

Two years into a job I asked where the old postmortems lived. The answer was a Confluence space, a shared Drive folder, three Slack threads, and “I think some are in the wiki?” Nobody had read most of them since the day they were filed. The cost showed up six weeks later when we spent a full day debugging a connection-pool exhaustion that—I learned afterward—had been thoroughly diagnosed and documented eighteen months earlier by a team that had since reorganized. We re-derived a fix that was sitting in a folder no one could find. A postmortem you can’t search is a postmortem you wrote for nobody.

The value of a postmortem isn’t just in the review meeting. It’s in being findable the next time something rhymes with it, and in the patterns that only emerge when you read fifty of them together. Both of those require structure that raw documents don’t have, and both are exactly what AI is good at producing—as long as a human owns the conclusions.

The two problems: retrieval and trends

A pile of postmortems fails you in two distinct ways.

Retrieval failure. During an incident, nobody has time to skim forty documents to find the one that matches. They need to ask “have we seen connection-pool exhaustion in the payments path before?” and get an answer in seconds. That requires each postmortem to carry consistent, searchable metadata—affected services, failure category, contributing factors, the fix—not just prose.

Trend blindness. Any single postmortem looks like a one-off. Read across a quarter and you discover that six of your eleven incidents touched the same config service, or that “missing alert” shows up as a contributing factor in half of them. That’s the systemic finding that justifies a real investment, and it’s invisible at the single-document level.

Structuring each postmortem for search

The foundation is extracting consistent metadata from every postmortem so they become queryable. I run new and historical postmortems through an extraction pass that emits a structured record.

Extract a structured record from this postmortem. Output JSON only,
using ONLY information present in the document. For any field not
stated, use null — do not infer.

{
  "incident_id": "",
  "date": "",
  "severity": "",
  "duration_minutes": null,
  "affected_services": [],
  "failure_category": "",   // e.g. config-error, capacity, dependency,
                            // deploy, data-corruption, expiry, network
  "contributing_factors": [],  // short phrases
  "detection_gap_minutes": null,
  "root_cause_summary": "",    // one sentence, quote-grounded
  "fix_summary": "",
  "action_items_open": []
}

The null rule keeps the index honest—a missing duration shows up as a gap to fix in your postmortem template, not a fabricated number that pollutes every aggregate you compute later. Store these records alongside the documents and you’ve turned a folder into a searchable index. Now “connection-pool exhaustion in payments” returns the eighteen-month-old postmortem in seconds.

A word of caution on retrieval: when someone asks the knowledge base a question, the AI should answer with links to source postmortems, not from its own synthesis. The model’s job is to find and cite the relevant documents; a human reads the actual postmortem before acting on it. A confident summary that misremembers the fix is how you ship the wrong remediation twice.

The quarterly trend report

Once you have structured records, the quarterly read-across becomes tractable. I feed the model the full set of records for the period—not the prose, the extracted JSON, which keeps it grounded—and ask for patterns with their evidence.

You have structured records for all incidents in <quarter>. Produce a
trend report. For every claim, cite the specific incident IDs that
support it. Make no claim you can't back with at least two records.

Report:
1. FAILURE CATEGORIES — counts by category, biggest movers vs. last
   quarter if data provided.
2. REPEAT OFFENDERS — services or contributing factors appearing in
   3+ incidents. List the incident IDs.
3. DETECTION — median detection gap; incidents where it exceeded 20
   minutes (list IDs).
4. RECURRING CONTRIBUTING FACTORS — phrases appearing across multiple
   incidents (e.g. "missing alert", "unwritten contract").
5. OPEN ACTION ITEMS aging across quarters — flag anything still open.

Do NOT speculate about causes beyond what the records show. End with
"QUESTIONS FOR HUMANS" — things the data hints at but can't confirm.

The “cite the incident IDs” and “at least two records” rules are what separate a trustworthy trend report from a horoscope. A trend nobody can trace back to specific incidents is one nobody will act on—and rightly so.

Q2 trend report (excerpt)

Failure categories: config-error (5), dependency (3), capacity (2), expiry (1). Config-error up from 2 last quarter.

Repeat offender: the shared config service appears in INC-2210, 2244, 2291, 2305 — four of eleven incidents. This is the single biggest systemic theme of the quarter.

Detection: median gap 14 min; three incidents exceeded 20 (INC-2244: 37, INC-2291: 37, INC-2260: 22). All three lacked burn-rate alerts.

Recurring factor: “unwritten cross-team contract” in INC-2291, 2305.

QUESTIONS FOR HUMANS: Is the config-service concentration a real fragility or just where we happen to deploy most often? Records can’t say.

That repeat-offender line is the entire reason to do this. No single postmortem flagged the config service as a systemic risk—it took reading four together to see it, and now it’s a funded redesign instead of a fourth incident.

Humans own the conclusions and the priorities

The AI builds the index and surfaces the candidate patterns; people decide what they mean and what to do. The “QUESTIONS FOR HUMANS” section is deliberate—the data can show that the config service appears in four incidents, but only an engineer who knows the architecture can say whether that’s genuine fragility or a sampling artifact of where you deploy most. Don’t let a trend report’s confidence substitute for that judgment.

The discipline that makes this work is boring and non-negotiable: structure every postmortem at write time, run the extraction as part of filing, and read the trend report as a team once a quarter with someone empowered to fund the systemic fixes it surfaces. The knowledge base saves you from re-deriving old fixes; the trend report saves you from treating systemic problems as a string of one-offs. AI does the indexing and the read-across no human will do by hand; you own what the patterns mean.

The extraction and trend prompts live with the rest of my incident set in the prompts library, and the systemic-themes thinking continues across the postmortems category. For the per-incident foundation these all build on, see the blameless postmortem guide.

A postmortem you can’t find is one you wrote for nobody. Structure them, search them, and read them together.

Building a Searchable Postmortem Knowledge Base and Trend Report With AI

The two problems: retrieval and trends

Structuring each postmortem for search

The quarterly trend report

Humans own the conclusions and the priorities

Download the Free 500-Prompt DevOps AI Toolkit