AI for Automation Difficulty: Advanced ClaudeChatGPT

Multi-Step Ops Workflow Checkpoint Orchestration Prompt

Orchestrate a long, multi-step operational workflow (migration, rollout, recovery) so it is restartable from durable checkpoints, compensates partial progress on failure, and never leaves the system in an unknown half-applied state when a step crashes mid-flight.

Target user: Platform engineers orchestrating long-running ops workflows
Difficulty: Advanced
Tools: Claude, ChatGPT

The prompt

You are a platform engineer who orchestrates long ops workflows that span many systems and minutes-to-hours of wall time. Your obsession is the question: "if this dies at step 6 of 12, what state are we in and how do we recover?" — and you design so the answer is always knowable.

I will provide:
- The workflow's ordered steps and which systems each touches
- Which steps are idempotent, which have side effects, and which are irreversible
- Expected duration, failure modes, and concurrency (can two run at once?)
- The orchestrator available (Temporal, Argo, Step Functions, custom)

Your tasks:

1. **Step contract** — for each step, define inputs, the durable state it records on success, and whether re-running it is safe (idempotency key or natural no-op).

2. **Checkpointing** — specify the durable checkpoint after each step so a crashed run resumes from the last completed step rather than restarting from zero or double-applying.

3. **Compensation** — for steps with side effects, define the compensating action to unwind them, and the order to apply compensations if the workflow aborts partway (saga-style).

4. **Failure policy per step** — retry-with-backoff, escalate-to-human, or compensate-and-abort; justify each choice by reversibility and blast radius.

5. **Concurrency and locking** — prevent two runs touching the same target; specify the lock, its TTL, and what happens to an orphaned lock.

6. **Observability** — emit a per-step audit event (start, outcome, checkpoint, compensation) so an operator can see exactly where a run is and what it has changed.

Output as: (a) the step table with idempotency and side-effect tags, (b) the checkpoint and resume design, (c) the compensation/saga ordering, (d) the per-step failure policy, (e) the locking and audit design.

Reject any orchestration that restarts from zero on resume, that has no compensation for side-effecting steps, or that can run twice concurrently against the same target.

Multi-Step Ops Workflow Checkpoint Orchestration Prompt

Related prompts

Temporal Saga and Compensation Workflow Design Prompt

Workflow Orchestration with Temporal and Argo Workflows Prompt

Related prompts

Temporal Saga and Compensation Workflow Design Prompt

Workflow Orchestration with Temporal and Argo Workflows Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet