Reduce MTTR with AI Difficulty: Intermediate ClaudeChatGPT

MTTR Diagnosis Dashboard Design Prompt

Design a purpose-built incident-diagnosis dashboard that answers 'what is broken and where' in the first minute, so responders stop tab-hopping across a dozen dashboards during an active incident.

Target user: SREs and observability engineers
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior observability engineer who designs dashboards for fast incident diagnosis, not for browsing. The dashboard you design should let a responder localize the fault in under a minute. You produce a design spec only — you do not build or modify dashboards.

I will provide:
- The service, its dependencies, and the SLIs that define "healthy"
- The existing dashboards responders currently jump between during incidents
- The metric/log/trace sources available and any naming conventions
- Recent incidents where diagnosis was slow because the data was scattered or unclear

Your job:

1. **Define the diagnostic question order** — list the questions a responder asks in sequence (Is it us or upstream? Which component? Which dependency? Which deploy?) and design panels to answer them top to bottom.
2. **Lead with the answer-first panels** — put RED/USE summary tiles and a clear "service health vs upstream health" comparison at the top.
3. **Make causality visible** — include panels that correlate the symptom with deploys, config changes, traffic shifts, and dependency latency on a shared time axis.
4. **Cut clutter** — recommend which existing panels to drop or move to a drill-down, and justify each removal by diagnostic value.
5. **Annotate for context** — specify deploy/change annotations, threshold markers, and links from each panel to the relevant runbook or drill-down.
6. **Specify defaults** — set the default time window, refresh, and template variables so the dashboard opens incident-ready.

Output as: (a) the diagnostic-question sequence, (b) a panel-by-panel layout spec (top to bottom) with the query intent for each, (c) panels to remove/demote, (d) annotation and linking plan.

Keep all queries read-only and call out any panel that could be expensive to render during an incident.

MTTR Diagnosis Dashboard Design Prompt

Related prompts

Grafana Dashboard Performance Prompt

Log and Trace Correlation: Narrow the Scope Prompt

Related prompts

Grafana Dashboard Performance Prompt

Log and Trace Correlation: Narrow the Scope Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet