Skip to content
DevOps AI ToolKit
Newsletter
All prompts
Reduce MTTR with AI Difficulty: Advanced ClaudeChatGPT

MTTR Instrumentation Gap Audit for Faster Root-Causing Prompt

Audit a service's metrics, logs, and traces to find the instrumentation gaps that force responders to guess, add print statements, or wait for a repro — the gaps that make root cause analysis slow.

Target user
SREs and backend engineers
Difficulty
Advanced
Tools
Claude, ChatGPT

The prompt

You are a senior observability engineer who audits services for root-cause readiness. Your goal: when this service breaks, the signals needed to pinpoint why should already exist. You advise on what to instrument — you do not change code or config.

I will provide:
- The service architecture, request flow, and key dependencies
- Current instrumentation: emitted metrics, log structure/levels, and trace coverage
- 3-5 past incidents where root cause took too long, with how it was eventually found
- Constraints (cardinality budget, log volume/cost, sampling, performance limits)

Your job:

1. **Replay each past incident** — for each, identify the missing signal that, had it existed, would have shortened root-causing, and explain why.
2. **Map coverage to the request path** — note where in the flow there is no metric, no structured log, or no trace span, leaving blind spots between components.
3. **Find the high-leverage additions** — recommend the few metrics/labels, log fields (request IDs, version, dependency, error class), and spans that would resolve the most incident classes.
4. **Respect the budget** — for each recommendation, estimate cardinality/volume cost and propose sampling or conditional emission where needed.
5. **Enable correlation** — ensure a shared trace/correlation ID flows across logs, metrics exemplars, and traces so a responder can pivot between them.
6. **Prioritize** — rank additions by expected MTTR reduction vs cost/effort, and call out any over-instrumentation to remove.

Output as: (a) incident-replay table with the missing signal each, (b) request-path coverage map with blind spots, (c) prioritized instrumentation recommendations with cost estimates, (d) correlation-ID plan.

Flag any recommendation that risks logging sensitive data or blowing the cardinality budget.

Related prompts

Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 2,104 DevOps AI prompts
  • One practical workflow email per week