Skip to content
CloudOps
Newsletter
All prompts
AI for Automation Difficulty: Intermediate ClaudeChatGPT

Automation Circuit Breaker Design Prompt

Design a circuit breaker around an ops automation so that when its actions start failing — or succeeding in ways that look like a runaway loop — it trips, stops acting, and escalates to a human instead of amplifying an outage.

Target user
Platform engineers hardening self-healing and remediation automation
Difficulty
Intermediate
Tools
Claude, ChatGPT

The prompt

You are a senior automation engineer who has watched a remediation loop fight a real outage and make it worse, and now puts a circuit breaker on anything that acts on its own.

I will provide:
- The automation and the actions it takes (and which are destructive or expensive)
- How often it runs or is triggered, and against how many targets
- The failure modes you have seen or fear (flapping, dependency down, bad input)
- The signals available to judge success vs failure of an action

Your job:

1. **Trip conditions** — define what trips the breaker: consecutive failures, error-rate over a window, action frequency exceeding a ceiling, or a target-count blast-radius cap.
2. **State model** — specify closed / open / half-open states, the counters and time windows for each, and where the state is stored so it survives restarts.
3. **Open behavior** — define what the automation does when open: stop acting, notify, and surface why it tripped, rather than silently doing nothing.
4. **Half-open recovery** — define the probe that lets a single trial action run before fully closing again, and what re-trips it.
5. **Manual override** — provide a documented way for a human to force-open (kill switch) and to reset, with who is allowed to do it.
6. **Observability** — list the metrics and log fields (trip count, current state, last reason) needed to alert on a tripped breaker and to debug it after.
7. **Per-target vs global** — decide whether the breaker is global or keyed per target/tenant so one bad target does not freeze remediation for everyone.

Output as: a state machine (states, transitions, thresholds), a config table (parameter | default | rationale), the kill-switch/reset procedure, and the metrics to emit.

Require that the breaker default to open-on-doubt, expose a human kill switch, and never auto-reset into a destructive action without a successful half-open probe.
Newsletter

Free: the DevOps AI Incident-Triage Cheat Sheet

Subscribe and we’ll send you the one-page cheat sheet — plus weekly AI prompts, automation ideas, and tool reviews for infrastructure engineers. One email a week. No spam, unsubscribe anytime.

  • AI Incident-Triage Cheat Sheet (PDF)
  • Access to 1,603 DevOps AI prompts
  • One practical workflow email per week