Running a Monthly SEV Review Board That Catches Systemic

The first time I sat in a SEV review board, it was three engineering managers and a director taking turns reading postmortem summaries out loud while everyone else checked Slack. Forty minutes in, nobody had learned anything that wasn’t already in the docs. The meeting existed because someone said incident review meetings were a best practice, but it had quietly turned into a status report read-aloud. The whole point of a review board is to see the patterns a single postmortem can’t — and we were missing every one of them.

A SEV review board is not a retrospective. A retro looks deep at one incident. A review board looks wide across many. Its job is to answer questions no single incident can: are we paging the same team every week for the same class of failure? Are action items from three months ago still open? Is one dependency showing up in the timeline of half our SEV1s? That cross-cutting view is where systemic risk hides, and it’s exactly the kind of synthesis work AI is genuinely good at — provided humans stay the ones who decide what to do about it.

What a review board is actually for

The board exists to convert a pile of individual incidents into organizational learning. Concretely, it should produce three things every cycle: a short list of recurring themes, a set of decisions about which themes to invest in fixing, and an honest accounting of whether last cycle’s commitments happened. Everything else — reading summaries, re-litigating root cause, assigning blame — is noise that crowds out the work only this meeting can do.

I run mine monthly for a mid-size org. Weekly is too frequent to see trends; quarterly is too slow to course-correct. The attendees are the people who can actually reallocate work: engineering leads, an SRE representative, and one product owner who can speak to customer impact. Keep it under eight people. A review board that needs a projector and a presenter has already failed.

Where AI earns its seat

Between meetings, someone has to read every postmortem from the period and find the threads connecting them. Historically that was a senior engineer’s weekend, and it got skipped the moment they got busy. This is the part to hand to a model.

Feed it the month’s postmortems and ask for clustering, not conclusions: which incidents share a contributing factor, which services appear repeatedly in timelines, which action items reference the same underlying gap. The free AI Incident Response Assistant is built for this kind of cross-document synthesis — pulling structured signal out of a dozen unstructured writeups so the humans walk into the meeting with a candidate list of themes instead of a blank page.

The prompt that works for me is blunt: “Here are nine postmortems from May. Group them by shared contributing factor. For each group, quote the specific timeline entries that justify the grouping. Do not propose fixes.” That last sentence matters. The moment you ask for fixes, you get plausible-sounding remediation that nobody validated, and the board starts rubber-stamping a model’s homework instead of doing its own thinking.

Pro Tip: Ask the AI to cite the exact timeline line for every pattern it claims. A theme you can’t trace back to a specific incident moment is a hallucinated theme, and the citation requirement is what lets you catch it in ten seconds instead of committing a team to fixing a problem that doesn’t exist.

Walking the agenda

Open with the accountability pass: what did we commit to last cycle, and what actually shipped? Put the open-versus-closed count on screen first, before any new analysis. If your closure rate is bad, every new theme you discuss today is a theme you probably won’t fix either, and the board needs to feel that before it adds to the pile. I’ve watched boards generate twelve new action items while sitting on forty open ones — that’s not a review board, it’s a backlog generator.

Then move to themes. The AI-clustered list is a starting point, not a verdict. For each candidate theme, the board’s job is to decide: is this real, is it worth fixing now, and who owns it? Some themes the model surfaces will be coincidence — three unrelated incidents that happened to touch the same service. Humans kill those. Some real themes the model will miss entirely because they live in context that wasn’t in the documents — a vendor contract expiring, a team that just lost two people. Humans add those.

A theme the board caught that no single postmortem did

Here’s the value made concrete. One month, the AI clustering flagged that four separate SEV2s — a slow checkout, a failed batch job, an alerting gap, and a deploy rollback — all had timeline entries mentioning the same internal config service being slow to respond. No individual postmortem called this out, because in each one the config service was a minor footnote, not the headline cause. The incident owners had each routed around it and moved on.

The board looked at the clustered evidence and made a call a single retro never could: the config service was a systemic single point of weakness, and the right investment wasn’t four separate fixes but one reliability project on that service. That decision came from humans reading the synthesized pattern and applying judgment about cost and priority. The AI found the thread; the people decided it was worth pulling. That division of labor — AI for the wide read, humans for the call — is the entire model.

Keeping decisions and synthesis on the right sides of the line

It is tempting, once the model is good at clustering, to let it also rank themes by severity, recommend owners, and draft the action items. Resist most of that. Ranking by severity requires knowing business context the model doesn’t have. Recommending owners requires knowing team capacity and politics. Drafting action items is fine as a starting draft — but the board must rewrite them into commitments a named human accepts out loud, because an action item nobody verbally owned in the room is an action item that won’t get done.

The clean split: AI synthesizes the corpus and drafts language. Humans decide what’s real, what matters, who owns it, and what we’re explicitly choosing not to fix. And under no circumstances does any tool in this loop touch production — a review board deals in decisions and priorities, never in actions. If your synthesis tooling has write access to anything other than a doc, you’ve built the wrong tool.

Pro Tip: End every board with an explicit “not doing” list. The themes you consciously decline to fix this cycle are as valuable as the ones you commit to, because next cycle you can check whether declining hurt you. A board that only ever adds work is one that never learns to say no.

Making the output stick

The board’s decisions are worthless if they evaporate by Monday. Capture them in one place: theme, decision, owner, and a check-in date. I keep mine in a living doc and have the assistant draft a one-paragraph summary for the broader engineering channel — synthesis again, not decision. The summary tells the org what the board concluded and why, which builds trust that incident pain actually feeds back into priorities. If you want repeatable prompts for this kind of recurring synthesis, the prompt library and the curated prompt packs have templates worth adapting.

The review board is the slowest feedback loop in your incident program and often the most important one. A single postmortem makes you fix one bug. A good review board makes you fix the reason that class of bug keeps happening. Let the model do the reading so the humans have energy left for the deciding — and explore more practices in incident response.

Running a Monthly SEV Review Board That Catches Systemic Risk