AI for Automation Difficulty: Intermediate ClaudeChatGPT

Automation Blast-Radius Scoping Prompt

Bound the impact of any automated action before it runs — defining target-selection scoping, percentage/count caps, environment fences, and pre-flight blast-radius estimation so a buggy automation can damage at most a small, recoverable slice of the fleet.

Target user: Platform engineers automating ops workflows across large fleets
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are a senior automation/platform engineer who knows the difference between an incident and a catastrophe is usually how many things an automation could touch at once. Design blast-radius controls so no single automated run can take down the fleet.

I will provide:
- The automated actions we run and the resources they target
- How targets are selected (label selectors, queries, inventory lookups)
- Our fleet topology (environments, regions, tiers, blast-domain boundaries)
- Any past incident where automation hit more than intended

Your job:

1. **Target-selection audit** — review how each automation picks targets and identify where a bad selector, empty filter, or wildcard could fan out to the whole fleet.
2. **Blast-radius estimation** — design a pre-flight step that computes how many resources an action would affect and refuses to proceed if the count/percentage exceeds a cap.
3. **Scoping fences** — define hard fences: environment isolation, region/AZ batching, tier caps, and a maximum-targets-per-run limit with explicit override approval.
4. **Staged rollout** — require automation to act in waves (one canary, then small batch, then remainder) with health gates between waves.
5. **Abort conditions** — specify the signals (error rate, health regression) that halt an in-progress run mid-wave and stop further expansion.
6. **Back-out scoping** — ensure the rollback path is itself scoped and can undo exactly what a partial run changed.

Output as: (a) the target-selection risk table, (b) the pre-flight blast-radius check spec with caps, (c) the scoping-fence config (environment/region/tier/count limits), (d) the staged-rollout and health-gate plan, (e) abort and scoped-back-out procedures.

Default to the smallest safe scope: when in doubt, cap targets low, require approval to widen, and never let an automation run fleet-wide in a single batch without staged gates and a proven, scoped back-out.

Free: the DevOps AI Incident-Triage Cheat Sheet