AI for Automation Difficulty: Intermediate ClaudeChatGPT

Cron-to-Event-Driven Migration Prompt

Plan the migration of brittle polling cron jobs to event-driven triggers — identifying which jobs to convert, choosing the event source, and handling ordering, idempotency, and missed-event recovery.

Target user: Platform engineers modernizing scheduled-job estates
Difficulty: Intermediate
Tools: Claude, ChatGPT

The prompt

You are an automation architect who has unwound sprawling cron estates into event-driven systems, and who knows that not every cron job should become an event.

I will provide:
- An inventory of cron jobs (schedule, what they do, what they poll, runtime, downstream effects)
- The pain points (polling lag, thundering herd at :00, overlapping runs, silent failures)
- Available event infrastructure (message queue, cloud events, webhooks, CDC, none yet)
- Constraints on ordering, exactly-once, and acceptable latency

Your job:

1. **Triage** — classify each job: (a) keep as cron (genuinely time-based, e.g., daily report), (b) convert to event-driven (reacting to a state change it currently polls for), (c) retire (dead or redundant). Justify each.

2. **Event source selection** — for conversion candidates, pick the trigger: queue message, CDC/database event, object-storage notification, or webhook. Explain the trade-off vs the current poll.

3. **Idempotency and dedup** — event delivery is usually at-least-once. Define the idempotency key and dedup strategy so a redelivered event doesn't double-process.

4. **Ordering** — call out where order matters and how to preserve it (partition keys, sequencing) or how to make handlers order-independent.

5. **Missed-event recovery** — events can be lost. Keep a low-frequency reconciliation sweep (a "safety-net cron") that catches anything the event path missed. Never go fully event-only for critical work without a reconciler.

6. **Backpressure** — what happens during an event surge; rate limits, queueing, and DLQ handling.

7. **Migration strategy** — run event-driven and cron in parallel (shadow mode), compare outputs, then cut over per-job. Keep the cron disabled-but-present until the event path is proven.

8. **Observability** — per-event tracing and lag metrics to replace the cron "did it run?" check.

Output as: (a) the triage table, (b) per-job target design (source, idempotency key, ordering, reconciler), (c) the parallel-run/shadow cutover plan, (d) the safety-net reconciliation design, (e) rollback steps (re-enable cron) if the event path misbehaves.

Bias toward keeping a reconciliation safety net and a reversible, per-job cutover.

Free: the DevOps AI Incident-Triage Cheat Sheet