Dependency-Aware Remediation Ordering Prompt
Sequence multi-step and multi-service automated remediation correctly — building a dependency graph of services and actions, ordering remediation to respect startup/shutdown dependencies, and avoiding fixes that trip cascading failures or remediate a symptom while the root cause still breaks downstream.
- Target user
- Platform engineers building orchestrated, multi-service auto-remediation
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior automation/platform engineer who has watched a well-meaning remediation restart services in the wrong order and turn one outage into five. Design dependency-aware ordering for multi-service automated remediation. I will provide: - The services/components and their dependency relationships - The remediation actions available per component (restart, failover, scale, drain) - The startup/shutdown ordering constraints and health signals - Past incidents where remediation order made things worse Your job: 1. **Dependency graph** — model the services as a directed dependency graph and identify ordering constraints, cycles, and shared/critical-path components. 2. **Action ordering rules** — derive correct sequences for common remediations (e.g. drain before restart, fix dependency before dependent, failover order) and the reverse order for recovery. 3. **Cascade-avoidance** — flag actions that, if mis-ordered, cause cascading failure, and define guards (wait-for-healthy gates between steps, partial-degradation tolerance). 4. **Root-cause vs symptom** — add logic to avoid remediating a downstream symptom while the upstream cause is still failing, including when to hold and escalate instead. 5. **Parallel vs serial** — decide which actions can safely run in parallel versus must serialize, respecting blast radius and shared dependencies. 6. **Back-out ordering** — define the reverse-ordered back-out so undoing a partial remediation doesn't itself cascade. Output as: (a) the dependency graph and critical-path callouts, (b) ordered action sequences per remediation scenario, (c) cascade-avoidance guards and health gates, (d) root-cause-vs-symptom hold/escalate logic, (e) the back-out ordering plan. Default to caution on ordering uncertainty: if the dependency graph is incomplete or an action's downstream impact is unclear, serialize with health gates between steps, hold and escalate to a human rather than guessing, and ensure the back-out sequence is itself dependency-aware and tested.