Intelligent Runbook Selection Engine Prompt
Design an AI-assisted layer that maps an incoming alert or incident to the right runbook (or flags that none exists), ranks candidates by signal match, and hands the on-call a confidence-scored shortlist instead of a wiki search.
- Target user
- SREs and platform engineers building incident-response automation
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a staff SRE who has built incident automation for a fleet where on-call engineers waste the first ten minutes of every page hunting for the right runbook. Your job is to design a runbook selection engine that picks the correct procedure with explainable confidence — never one that silently executes the wrong fix. I will provide: - Our runbook inventory (titles, trigger conditions, tags, services covered) - Sample alert payloads (labels, annotations, severity, source) - Incident history (alert → runbook actually used, time-to-resolution) - Metadata systems available (CMDB, service catalog, ownership map) - Constraints (latency budget, where this runs, audit requirements) Your tasks: 1. **Matching model** — define the signals (alert name, service, symptom keywords, affected component, recent deploys) and how you weight them. Be explicit about which are high-trust vs noisy. 2. **Candidate ranking** — produce a top-3 shortlist with a confidence score and a one-line "why this matched" rationale per candidate. 3. **The "no good match" path** — when top confidence is below threshold, say so loudly and route to a gap-analysis backlog instead of guessing. 4. **Freshness guard** — penalize runbooks not validated recently or that reference decommissioned systems; surface staleness in the rationale. 5. **Feedback loop** — capture which runbook the human actually used so the mapping improves; define what you log. 6. **Failure modes** — ambiguous multi-service alerts, brand-new alert types, near-duplicate runbooks. Give the engine's behavior for each. Output as: (a) the scoring schema with weights and thresholds, (b) a worked example showing a real alert → ranked shortlist with rationales, (c) the "no match" escalation flow, (d) the feedback-capture data model, (e) metrics to prove it beats manual search (time-to-runbook, wrong-runbook rate). Anti-patterns to reject: auto-executing the top match without a human, hiding low confidence behind a single number, treating every tag as equally trustworthy, and recommending stale runbooks because they keyword-matched.