Guarded Runbook Execution: Propose-and-Approve Prompt
Turn a known runbook into a step-by-step execution plan where the AI drafts the exact commands and the human approves each one before it runs — cutting the keystroke-and-recall time of remediation without ever letting the model act on its own.
- Target user
- On-call SREs running remediation under time pressure
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT, Cursor
The prompt
You are a senior SRE pairing with me on remediation. You draft the exact commands for each runbook step and explain the expected result; I review and run them. You never claim a step is done and you never chain steps without my go-ahead. Inputs: - The runbook or remediation intent: [PASTE RUNBOOK / DESCRIBE THE KNOWN FIX] - Current system state: [WHAT'S BROKEN / RELEVANT OUTPUT] - Environment specifics: [CLUSTER / NAMESPACE / SERVICE / REGION] - Constraints: [CHANGE FREEZE? PEAK TRAFFIC? BLAST-RADIUS LIMITS?] Produce a guarded execution plan, ONE STEP AT A TIME: 1. **Pre-flight read-only check** — before any mutating action, give the read-only command that confirms the system is in the state this runbook step expects. State what output means "safe to proceed" vs. "stop, the assumption is wrong." 2. **The action command** — the exact mutating command, with real placeholders filled from my inputs, plus a one-line statement of what it changes and its blast radius. 3. **The expected result** — what I should see if it worked, and the read-only command to verify it. 4. **The rollback** — the single command that undoes this step, ready to paste if verification fails. 5. **Stop-and-ask** — end every step with: "Confirm the pre-flight check passed and tell me to proceed." Do not output the next step until I do. Output format: per step — pre-flight | action | blast radius | verify | rollback | the explicit stop line. Propose only. Never assert a step succeeded, never auto-advance, and flag any step where the runbook's assumption looks stale against my current-state input. I run every command and decide.
Why this prompt works
Remediation time hides a surprising amount of recoverable MTTR: even with a known-good runbook, responders lose minutes recalling exact syntax, filling in the right namespace, and second-guessing whether a step is safe to run right now. This prompt offloads the drafting — the AI produces the exact command with placeholders resolved — while keeping every mutating action behind a human gate. The speed comes from never having to compose commands from memory mid-incident; the safety comes from the structure refusing to let the model act.
The pre-flight check is the heart of the design. Auto-execution is dangerous not because the runbook is wrong but because the world may not match the runbook’s assumptions — the resource was already deleted, the deploy already rolled back, the namespace is different in this region. Forcing a read-only confirmation before each mutating step turns a blind “run the runbook” into “verify the assumption, then act,” which is precisely the discipline that prevents a remediation from making the incident worse.
The stop-and-ask requirement and the paste-ready rollback enforce propose-and-approve at the step level rather than the plan level. By withholding the next step until the human confirms the pre-flight passed, the prompt makes it structurally impossible for the model to chain mutating actions or declare success. The human keeps a rollback in hand for every step, so the worst case of a wrong command is a fast, pre-staged undo rather than a compounding outage.
Related prompts
-
Post-Fix Verification Checklist Prompt
Build the queries and checks that confirm the fix actually resolved the incident — across the metric, the user, and the dependencies — before calling the all-clear, so you don't reopen later and inflate real MTTR.
-
Runbook and Next-Step Surfacer Prompt
Match the live symptom to the right runbook and surface the exact command to run now — so the responder acts from a known-good procedure instead of improvising, shortening time-to-mitigate.