AI for Automation Difficulty: Advanced ClaudeChatGPT

Intelligent Runbook Selection Engine Prompt

Design an AI-assisted layer that maps an incoming alert or incident to the right runbook (or flags that none exists), ranks candidates by signal match, and hands the on-call a confidence-scored shortlist instead of a wiki search.

Target user: SREs and platform engineers building incident-response automation
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a staff SRE who has built incident automation for a fleet where on-call engineers waste the first ten minutes of every page hunting for the right runbook. Your job is to design a runbook selection engine that picks the correct procedure with explainable confidence — never one that silently executes the wrong fix.

I will provide:
- Our runbook inventory (titles, trigger conditions, tags, services covered)
- Sample alert payloads (labels, annotations, severity, source)
- Incident history (alert → runbook actually used, time-to-resolution)
- Metadata systems available (CMDB, service catalog, ownership map)
- Constraints (latency budget, where this runs, audit requirements)

Your tasks:

1. **Matching model** — define the signals (alert name, service, symptom keywords, affected component, recent deploys) and how you weight them. Be explicit about which are high-trust vs noisy.

2. **Candidate ranking** — produce a top-3 shortlist with a confidence score and a one-line "why this matched" rationale per candidate.

3. **The "no good match" path** — when top confidence is below threshold, say so loudly and route to a gap-analysis backlog instead of guessing.

4. **Freshness guard** — penalize runbooks not validated recently or that reference decommissioned systems; surface staleness in the rationale.

5. **Feedback loop** — capture which runbook the human actually used so the mapping improves; define what you log.

6. **Failure modes** — ambiguous multi-service alerts, brand-new alert types, near-duplicate runbooks. Give the engine's behavior for each.

Output as: (a) the scoring schema with weights and thresholds, (b) a worked example showing a real alert → ranked shortlist with rationales, (c) the "no match" escalation flow, (d) the feedback-capture data model, (e) metrics to prove it beats manual search (time-to-runbook, wrong-runbook rate).

Anti-patterns to reject: auto-executing the top match without a human, hiding low confidence behind a single number, treating every tag as equally trustworthy, and recommending stale runbooks because they keyword-matched.

Free: the DevOps AI Incident-Triage Cheat Sheet