The Incident Commander Role Explained for Engineering Teams
The incident commander coordinates, doesn't fix. A veteran SRE breaks down the role, the first five minutes, common mistakes, and where AI lightens the load.
- #incident-response
- #incident-commander
- #sre
- #on-call
- #leadership
- #coordination
The first time I was handed the incident commander role, I made the classic mistake: I dove into the logs and started debugging. Twenty minutes later we had three people fixing, nobody coordinating, two duplicate efforts, and a stakeholder in the channel asking questions no one was answering. That’s the lesson every IC learns once: the commander’s job is to coordinate, not to fix.
After running major incidents for 25 years, here’s what the role actually is and how to do it well.
What an incident commander does — and doesn’t
The IC owns the response, not the repair. Concretely, the IC:
- Maintains the single source of truth on what’s happening.
- Assigns clear ownership for each workstream.
- Decides severity and when to escalate.
- Keeps communication flowing to stakeholders and customers.
- Protects the responders’ focus so they can actually work.
The IC explicitly does not put their hands on the keyboard to fix the system. The moment the IC starts debugging, coordination stops, and coordination is the thing only the IC is doing.
This trips up senior engineers hardest. You’re the best debugger in the room, and you have to not debug. Your value as IC is keeping five people coordinated, which is worth more than your individual fix.
The first five minutes
The opening of an incident sets its tone. A good IC, on taking command:
- States that they’re IC, out loud, in the channel. “I’m taking IC for this.” No ambiguity about who’s coordinating.
- Establishes the current picture. What’s the symptom, what’s the impact, what’s the working severity.
- Assigns roles. Who’s investigating, who’s handling comms, who’s the scribe.
- Sets a comms cadence. “I’ll post an update every 15 minutes even if nothing’s changed.” Silence breeds panic and back-channel pings.
The discipline of naming the IT out loud matters more than it sounds. Incidents with no clear commander turn into a crowd, and crowds are slow.
The roles around the IC
For anything past a small incident, the IC delegates:
- Operations / subject-matter leads do the actual investigation and fixing.
- Communications lead owns status-page updates and stakeholder messages, freeing the IC.
- Scribe keeps the timeline as events happen — gold for the postmortem and impossible to reconstruct later.
On a small team one person may wear several hats, but the IC should always be distinct from the person elbow-deep in the fix.
Common incident commander mistakes
- Becoming the fixer. Covered above, and worth repeating because it’s the number-one failure.
- No comms cadence. Stakeholders fill silence with anxiety and start pinging responders directly, which destroys focus. A predictable drumbeat of updates buys you quiet.
- Letting the channel sprawl. Ten people freelancing means duplicated effort and missed signals. The IC keeps work assigned and visible.
- Never declaring resolution. An incident that just fizzles out leaves people unsure if they’re still on the hook. Call it: “Incident resolved, standing down, postmortem to follow.”
- Skipping the handoff. Long incidents outlast one human’s focus. Hand off IC explicitly, with a status summary, before you’re too fried to be useful.
Where AI helps the IC
The IC’s load is mostly cognitive: holding the whole picture, drafting comms, and tracking what’s been tried. AI lifts a real chunk of that.
Comms drafting. The IC needs a customer-facing update and an internal one, in the right register, now, without breaking concentration. Hand it off:
“Write a status-page update for a degraded-checkout incident: customer-facing, no jargon, no root-cause speculation, ~3 sentences. Then a one-line internal update with current severity and what we’re checking.”
Timeline keeping. If you don’t have a dedicated scribe, the model can structure the channel scrollback into a timeline you sanity-check, so the postmortem isn’t a reconstruction from memory.
Investigation structure. While the IC stays out of the keyboard, they can still feed symptoms to a model to get a ranked set of hypotheses and read-only diagnostics to hand to the operations lead — coordinating the investigation without doing it.
We keep incident-response prompts for comms and triage, and the Incident Response tool produces the structured assessment and comms drafts an IC needs without leaving the coordination seat.
When you’re a team of one IC
On small teams the IC is also the only available engineer, and “don’t touch the keyboard” feels impossible. The compromise: separate the two jobs in time, not people. Spend the first two minutes purely as IC — assess, declare severity, post one comms update, set a cadence — then drop into the fix, but resurface on a timer to re-coordinate and re-communicate. Set an actual alarm for your comms cadence so the investigation doesn’t swallow it. It’s not as clean as a dedicated IC, but the discipline of periodically stepping back out of the fix to coordinate is what keeps a solo response from tunneling.
Knowing when to stand down
Declaring resolution well is part of the job. Don’t call it the instant the graph recovers — confirm the fix held for long enough that you trust it, verify the customer-facing impact is actually gone, and only then announce. State it explicitly in the channel: “Incident resolved as of [time], standing down, postmortem owner is [name].” That sentence releases the responders, tells stakeholders it’s over, and assigns the follow-up in one move. An incident that fizzles out without a clear “we’re done” leaves people wondering whether they’re still on the hook and lets the postmortem quietly never happen.
You can learn this
Incident command is a skill, not a personality trait. Run gamedays where people rotate through the IC role with no production stakes. The first time someone commands an incident should not be during a real SEV1. Practice the first five minutes, practice the handoff, practice the stand-down, and practice staying off the keyboard.
AI-assisted comms and timelines are drafts. The incident commander owns every decision and message that goes out.
Download the Free 500-Prompt DevOps AI Toolkit
500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.
- 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
- Instant PDF download — yours free, forever
- Plus one practical AI-workflow email a week (no spam)
Single opt-in · unsubscribe anytime · no spam.