Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Incident Response By James Joyner IV · · 8 min read

Customer Communication During Outages: What to Say and When

How you talk to customers during an outage shapes whether they trust you after. Here's a practical framework for honest, well-timed outage communication.

  • #incident-response
  • #communication
  • #customer-success
  • #sre
  • #trust
  • #on-call

Customers forgive outages. Every serious company has them, and most users understand that. What customers don’t forgive is silence — finding out from a Twitter pile-on that something’s broken while your status page cheerfully reports all systems operational. The damage from an outage is mostly determined by how you communicate during it, not by how long it lasts.

This isn’t the internal incident channel and it isn’t the status-page mechanics — it’s the harder discipline of talking to people who are paying you while their thing is down. Here’s how I think about it.

The two audiences you’re juggling

During an outage you’re writing for two very different groups at once:

  • Affected customers who want to know: is this you or me, are you aware, and when will it be fixed?
  • Internal stakeholders — support, sales, leadership — who need to answer the first group without making things worse.

The mistake is writing one message and blasting it everywhere. Customers need calm, jargon-free reassurance. Support needs detail and a holding line. Leadership needs honest severity and business impact. Same facts, three registers.

The cardinal rules

A few principles that have never steered me wrong:

  1. Acknowledge fast, even with nothing to say. “We’re investigating reports of errors” within 10 minutes beats a perfect explanation an hour later. Speed of acknowledgment is the trust signal.
  2. Never speculate on root cause publicly. “We believe this is a database issue” becomes a lie the moment it turns out to be DNS, and now you’ve published a wrong root cause to customers. Describe impact, not cause.
  3. Be specific about impact, vague about ETAs you don’t have. “Checkout is failing for some EU customers” is honest and useful. “Back up in 30 minutes” when you have no idea is how you lose the room.
  4. Over-communicate frequency, not detail. Customers calm down when updates arrive predictably. Commit to a cadence (“next update in 30 minutes”) and hit it, even if the update is “still working on it.”
  5. Match the tone to the severity. A degraded non-critical feature gets a measured note. A full payment outage gets visible urgency and a named human.

What to actually say at each stage

Initial acknowledgment (first 10–15 minutes):

We’re aware of an issue affecting [feature] and are actively investigating. Some customers may experience [observable symptom]. We’ll post an update within 30 minutes.

Mid-incident update (on cadence):

We’ve identified the source of the issue affecting [feature] and are working on a fix. [Feature] remains [degraded/unavailable] for [scope of customers]. Next update by [time].

Note what’s not there: no root cause, no blame, no internal jargon. “Source” not “the payments-gateway connection pool exhausted.”

Resolution:

This issue is now resolved. [Feature] has been fully restored as of [time]. We’re sorry for the disruption and will follow up with more detail on what happened.

The follow-up (hours to a day later):

This is where you earn long-term trust — a plain-language summary of what broke, the impact, and what you’re doing so it doesn’t recur. Honest, specific, no excuses. This is the customer-facing cousin of your internal postmortem, and it’s worth writing.

The scope question that decides everything

Before any message goes out, answer one question: who is affected? Vague scope is the most common comms failure. “We’re having issues” panics customers who are fine. “Customers in the EU region using card payments” lets unaffected users relax and affected ones plan.

If you’ve done blast-radius mapping, you already know the downstream impact of the failing component — turn that directly into the scope statement.

Using AI to draft comms without stopping the investigation

Comms is a tax on the attention you most need for fixing the thing. This is exactly where AI earns its place, because writing in the right register is something it’s genuinely good at and it never touches production.

Keep a prompt ready so you’re not composing at 3am:

“Active incident: [feature] is [degraded/down] for [scope of customers]. We don’t have a confirmed root cause or ETA. Write three things: (1) a customer-facing status update, calm, no jargon, no root-cause speculation, ~2 sentences with a commitment to update in 30 minutes; (2) a holding line for the support team to send to tickets; (3) a one-line internal update for leadership with honest severity and business impact.”

You skim three drafts in the right tone, adjust a word, send. The model is also good at the resolution note and the next-day follow-up — paste the timeline and ask for a plain-language customer summary that explains impact without over-sharing internals. We keep a library of outage-communication templates for exactly this.

One guardrail: scrub internal hostnames, customer names, and ticket IDs before pasting, and never let the model invent an ETA you can’t commit to.

A simple comms ownership model

Decide this before you need it: during an incident, the incident commander does not write comms. A separate communications lead owns it, so investigation and messaging run in parallel. For smaller teams, the IC can delegate to whoever’s least useful on the technical side. The point is that comms is a named job, not something that happens when someone remembers.

The long game

Every outage is a deposit or a withdrawal from customer trust. Handled with silence and spin, even a short outage burns trust you spent years building. Handled with fast acknowledgment, honest scope, predictable cadence, and a real follow-up, an outage can actually increase trust — because customers see how you behave under pressure.

If you want the structured version — paste your incident facts and get audience-appropriate drafts in seconds — that’s part of what the AI Incident Response Assistant is built for.

Generated communications are drafts, not policy. Always review scope and tone against your real customer impact before publishing.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.